Zanran Blog

 

PDF-to-Excel - three reasons why extracting tables from PDFs is hard

Posted by Jon Goldhill on 16-Nov-2016 17:35:10

Zanran has needed to put a huge amount of effort into its PDF-to-Excel software.  What seems intuitive to a human – it “looks like a table” – is full of issues, exceptions and special cases for computer software. 

In this, the first of a number of articles about content extraction from PDFs, I want to look at some of the fundemental problems.

Read More

Topics: Data Extraction, technology