Zanran - PDF Workbench

Zanran’s PDF Workbench enables a person to interact with the XML generated by our extraction process.  The software itself runs on any Windows PC or laptop.

It provides a visual interface that makes it easy to:

  • edit/correct the XML
  • enrich the XML with tags or other data
  • perform intelligent checks & processes on the XML - for value-added example see auditors

Here is a screenshot of a page – and the same page viewed in the PDF Workbench.  You can see the Workbench shows the boundaries of the tables and the blocks of text.

Page showing boundaries of tables & text

 The PDF Workbench shows an image of the original PDF, overlaid with the XML.  When user interacts with it, they are changing the XML.

Editing and correcting

Any of the boundaries can be changed manually.  For example, in tables cells can be split or merged.

If the PDF is a scanned document, and the OCR process has not been 100% accurate (e.g. a ‘3’ has been seen as an ‘8’) – then it can be corrected in the XML using the Workbench.

In each case the changes are written back to the XML files.

Workbench-editing

The edited tables can also be output as Excel.

Enriching the XML

New tags or descriptors can be added manually.  Or existing ones that were added automatically can be checked:

 Workbench-tagging

The tags are added into the XML.



Zanran’s Workbench is a very flexible package. If you would like to discuss any application, please contact us.