We are always working to improve our services and provide users with the best possible experience. This page outlines
pertinent information for API users about product changes. Please email us with any feedback or questions.
November 2012 Release
This page contains:
In order to provide an adequate amount of time for iThenticate’s custom built or MTS integrators to test our new upload processing system a beta testing period is available to our customers before we move everyone over to the new process. This beta testing period will last until January 2014. We suggest testing the upload process via your current API setup as we believe, for most customers, that no change is required to your API process in order to successfully use the new upload processing system.
Please download the provided test files and follow the recommended test cases below to ensure that your integration works properly with the new upload processing system.
To test the upload processing system, API integrations will need to set a specific flag, an XMLRPC boolean flag in the "document.add" method called “non_blocking_upload”.
<member>
<name>non_blocking_upload</name>
<value><boolean>1</boolean></value>
</member>
This is an XMLRPC structure member to include along with the existing “submit_to”, “folder”, and “uploads” members.
The effect of adding this flag will be that the “document.add” method will return more quickly. Your code should then examine the “is_pending” flag returned in the “document.get” method. There is no change to the XMLRPC response returned by the “document.add” method.
Your code will then call the “document.get” method and should inspect the “is_pending” flag of the response. When this flag is no longer true then there will either be a “parts” array that includes the ID of the completed report, or an “error” string element that provides the reason that the document could not be processed.
The following are examples shown are abstract representations of the XMLRPC data structures returned.
Response from the "document.add" (this is unchanged):
{
'sid' => 'db61dccf6333e34393fb032de89520c7bdb4a075',
'messages' => [
'Uploaded 1 document successfully'
],
'uploaded' => [
{
'filename' => 'example.pdf',
'id' => '9764583',
'mime_type' => 'application/pdf',
'folder' => {
'name' => 'My Documents',
'id' => '133344'
}
}
],
'status' => '200',
'response_timestamp' => '20121106T17:26:43',
'api_status' => '200'
};
Here is a very early response from a "document.get" call after uploading with the "document.add" method. Note that is_pending is true and there is no “parts” array (and the percent_match is an empty string indicating an undefined value).
{
'uploaded_time' => '20121106T17:26:43’,
'author_last' => '',
'is_pending' => '1',
'mime_type' => 'application/pdf',
'processed_time' => '20121106T17:26:43',
'percent_match' => '',
'id' => '9764583',
'title' => 'example.pdf',
'author_first' => ''
};
As polling continues the "document.get" method returns more data. Here “parts” are included, but the document is still pending.
{
'uploaded_time' => '20121106T17:26:43',
'author_last' => '',
'is_pending' => '1',
'processed_time' => '20121106T17:26:43',
'percent_match' => '',
'parts' => [
{
'doc_id' => '9764583',
'max_percent_match' => '',
'id' => '9811839',
'score' => '',
'words' => '844'
}
],
'id' => '9764583',
'title' => 'example.pdf',
'author_first' => ''
};
Finally a "document.get" response when "is_pending" is false and percent_match has a defined value. This indicates that the report is ready for viewing.
{
'uploaded_time' => '20121106T17:21:14',
'author_last' => '',
'is_pending' => '0',
'processed_time' => '20121106T17:21:24',
'percent_match' => '26',
'parts' => [
{
'doc_id' => '9764567',
'id' => '9811823',
'score' => '26',
'words' => '844'
}
],
'id' => '9764567',
'title' => 'example.pdf',
'author_first' => ''
};
This is an example of a “document.get” response after iThenticate has determined that the PDF contained no text (e.g. if the PDF is only an image of a document):
Note that "is_pending" is false and the response includes the "error" describing the error in human-readable form. The error code can be used by iThenticate technical support to help in diagnosis of the problem.
{
'uploaded_time' => '20121106T17:28:01',
'author_last' => '',
'is_pending' => '0',
'mime_type' => 'application/pdf',
'processed_time' => '20121106T17:28:01',
'error' => 'The document must contain at least 20 words of text to be accepted by the system. Error: -909',
'percent_match' => '',
'id' => '9764588',
'title' => 'just_an_image.pdf',
'author_first' => '’
};
The following zip file contains documents that you can use to test the upload processing system. The files are labeled with the terms “Success” and “Fail” to inform you whether they should upload successfully or produce an error when processing in the new upload system. We recommend testing both types of documents to see how your integration handles both successful and failed uploads.
If errors are experienced when uploading files that should be successful it may be due to how quickly your integration requests the "document.get" method. The new upload system accepts documents in the process much quicker than before but the text extraction does not occur within the "document.add" method as it used to. Polling document.get for the "is_pending" to be 0 will help avoid the issue of expecting the “parts” element to be returned when document.get is requested too soon after "document.add" is successful.
Many document errors will now be encountered in the document.get method due to the system extracting the text after "document.add" is successful. The previous system had a much more extensive check within the "document.add" method that slowed our systems down considerably, which is why we have decoupled this check from the document.add method and now run that check within the "document.get" method. Errors will be returned when is_pending = 0.
If your system does run into errors that require an update to your code, you may contact us to request an extension to the Beta period to accommodate your code release schedule.
We have been working to improve the Document Upload Process for iThenticate’s users. We are pleased to be able to provide our API customers with a Beta testing period that ends in January 2014. In January 2014 all of our API integrators will be on the new upload processing system. The following FAQs provide more information about this upgrade.
A number of improvements to the iThenticate Upload Process have been completed. The improvements include faster document processing, improved system responsiveness and reliability, and more detailed error reporting. No changes are required for API customers at this time, however, we encourage API customers to beta test as described below as soon as possible.
By upgrading our servers and including a new Upload Processing System the service’s speed has increased substantially. Furthermore, the new processing system is required for our next product release when we will make available the Document Viewer (DV) within the Similarity Report. The DV allows users to view the uploaded document in its original format, including images, tables and graphs within the Similarity Report.
Here is an example of what the DV will look like in iThenticate:
The iThenticate upload process has always been asynchronous. Documents are uploaded via the XMLRPC “document.add” method, then the “document.get” method is polled to determine when the document is no longer pending indicating that a report has been generated and may be viewed. The changes involve more background processing and therefore the “document.add” method will return more quickly.
If your API integration watches the “is_pending” flag included in the “document.get” XMLRPC response you may not require any changes to your code. If your integration expects to see a “parts” element included in the “document.get” response you may need to update your code. Because the “document.add” method will return faster the “parts” element may not exist when “document.get” is called quickly after the “document.add” method.
In addition, since more processing happens after the “document.add” call has returned, errors that might have been reported during the “document.add” method may now be reported during the “document.get” call.