Academic Publishers


Much academic material has been archived as PDFs, and many new manuscripts are submitted to publishers as PDFs. Yet PDFs are hard to work with as they have no structure or tagging (see Our Technology) and so plenty of extraction work still needs to be done manually.

Zanran provides powerful, scalable PDF processing solutions which can help in extracting data from, or converting PDF documents. Some of the principal abilities are:

    1. Converting PDF to XML
    2. Automated Table extraction from PDFs (extracting graphs, charts)
    3. Converting PDF files to responsive HTML for viewing on mobile devices.
    4. Text mining and extraction
    5. Data point extraction

Zanran’s core technology utilises sophisticated computer-vision algorithms and machine learning to understand the layout of PDF files and bring structure to their unstructured format. Zanran’s software gives businesses faster, cheaper content extraction.

Convert PDFs to XML

If you're looking to transfer documents to XML, manual processing can be slow and relatively expensive. Zanran’s PDF to XML process ‘understands’ the layout of the PDF which makes the subsequent semantic work that much easier. The software can then assign logical XML tags automatically – for quality assurance checking by a human operator. 

Efficient text mining & extraction

Where you require textual data for analysis or further processing, Zanran’s technology enables clean extraction of the core text from PDF files - ignoring page numbers, graphs, charts, footnotes, and other elements which are not required.

Extract specific data

If you are looking to extract numerical data from within your PDFs, Zanran’s PDF Data-Point Extraction technology enables you to specify the data you’re looking for using a template. The tables containing the data are automatically extracted into Excel files, then cross referenced with your defined parameters.

For more information or to discuss ideas you can contact the Zanran team here.

In the meantime, if you’d like to demo extracting tables into Excel format - please submit your PDF using the purple 'Start Process!' button. 

Convert your PDF