The properties are as follows:. id: A unique identifier for this file.
filetype: The type of the file as determined by the API by examining the contents of the file. This will have one of the values listed in the table below. size: The size of the file in bytes. sha256: The SHA-256 hash of the file contents.It is best practice to calculate your own values for size, sha256 and filetype (which in most cases will be a static value) of the file you are submitting and compare these to the values in the response in order to ensure that the file was not corrupted during transmission. The request body should specify the URL from where Waives can download the contents of the document's file. The Content-Type header must be set to application/json; if it is excluded, the request will be treated as an rather than an import.Only HTTP and HTTPS schemes are allowed (HTTPS is strongly recommended).The download of the file must succeed within 10 seconds, otherwise a 422 Unprocessable Entity is returned. The 422 response is returned in a few cases, such as when the download fails or the JSON body does not match the required schema.
The reason for the 422 response is provided in the response body.The newly created document resource is returned, along with a 201 Created status. The document resource includes the document's ID, which can then be used with the Get, Read, Classify, Extract Document Data, Get Redacted PDF and Delete endpoints.The article contains details of all file types supported by Waives, and the maximum file size. The properties are as follows:. id: A unique identifier for this file. filetype: The type of the file as determined by the API by examining the contents of the file.
This will have one of the values listed in the table below. size: The size of the file in bytes. sha256: The SHA-256 hash of the file contents.It is best practice to calculate your own values for size, sha256 and filetype (which in most cases will be a static value) of the file you are submitting and compare these to the values in the response in order to ensure that the file was not corrupted during transmission. Note that only documents created from PDFs containing images (i.e. The properties of a result are:. text: The text of the result. value: The value as a non-text type (e.g.
This endpoint can also be used to obtain a response that can be passed directly to the endpoint to get a PDF with all extracted data redacted.If the Accept header is application/vnd.waives.requestformats.redact+json then the response you receive will be a redaction request that will redact all data extracted from the document. You can either send this directly in a request to this endpoint or edit it first.One redaction mark is created for every non-empty result and alternative result for every field.Each redaction mark is labelled with the extraction field it came from to help you if you want to edit it, for example by removing marks for specific fields. The applymarks property controls how redactions are made in the PDF.If applymarks is true (the default) then as well as a redaction object being added to the PDF, the image underlying each field area is replaced with a black rectangle and any text in that area is removed.
The redaction is permanent and cannot be undone if the PDF is loaded into a PDF editor such as Adobe Acrobat.If applymarks is false then a redaction object is added to the PDF but the image and any text in the PDF are left unaltered. The redaction can be reviewed and accepted or deleted in a PDF editor such as Adobe Acrobat. Accepting the redaction in that tool will alter the image and remove the text. In most cases you will want to redact areas corresponding to the locations of data extracted using the Extract document data endpoint. Rather than building a redaction request manually you can request a response from that endpoint that you can pass straight to this endpoint.Simply make a request to the Extract document data endpoint, specifying an Accept header with the value application/vnd.waives.requestformats.redact+json. The response you receive will be a redaction request that will redact all data extracted from the document. You can either send this directly in a request to this endpoint or edit it first.
Each redaction field is labelled with the extraction field it came from to help you if you want to edit it, removing some fields for example. Once samples have been added to a classifier, the classifier must be 'trained'. During this process the classifier analyses the samples and determines the defining characteristics of each document type. Training can only be done when there are samples (that are not empty) of at least two document types.For optimal performance of requests to this endpoint you should only train once, when all the samples you intend to add have been added. Training multiple times won't hurt but will make requests slower.The retrain query parameter can be used to control whether training happens after the sample is added.When starting from a new (empty) classifier you must always set retrain=false for the first samples until you have added samples for at least two document types.Ideally you should set retrain=false for all except the very last sample you want to add, so the training is performed only once.