Zanran’s PDF Workbench enables a person to interact with the XML generated by our extraction process. The software itself runs on any Windows PC or laptop.
It provides a visual interface that makes it easy to:
Here is a screenshot of a page – and the same page viewed in the PDF Workbench. You can see the Workbench shows the boundaries of the tables and the blocks of text.
The PDF Workbench shows an image of the original PDF, overlaid with the XML. When user interacts with it, they are changing the XML.
Any of the boundaries can be changed manually. For example, in tables cells can be split or merged.
If the PDF is a scanned document, and the OCR process has not been 100% accurate (e.g. a ‘3’ has been seen as an ‘8’) – then it can be corrected in the XML using the Workbench.
In each case the changes are written back to the XML files.
The edited tables can also be output as Excel.
New tags or descriptors can be added manually. Or existing ones that were added automatically can be checked:
The tags are added into the XML.