Zanran’s extraction of tables from PDF files into Excel is very good, but will not be 100% accurate across thousands of PDF files. If operating at a large-scale, there’s always an opportunity for small errors.
To alleviate this problem, we have built Zanran’s PDF Workbench which allows a human operator to check and amend the extracted tables – quickly and easily.
The operator views the original PDF pages overlaid with a computer-generated grid.
The operator can select any cell in the grid to split it, or any group of cells to combine them. Similarly, by selecting rows or columns (see the grey areas at the top and left) the rows or columns can be merged or split in the same way.
In addition, the operator can add labels to the headers – to make it clear what to extract.
After editing, the operator saves the results as an XML file which is subsequently used to generate the Excel worksheet.
If you need very high accuracy in your data extraction, Zanran’s Workbench provides a useful manual checking stage to improve the quality of the Excel file.