29-Nov-2016

PDF-to-Excel conversion for better financial analysis

When a PLC releases its annual and interim accounts it’s driven by both duty and desire. On the one hand it has an obligation to publish audited figures for the world in general to see, and for its shareholders in particular. On the other, it understandably wishes to present its financial performance in as positive a light as possible. The result is that in addition to the obligatory tables showing balance sheets and earnings statements and so forth we’re given statistics showing, say, growth in market share by sector or by geography.

Investors and market watchers depend on all this data, but that doesn’t make them its unquestioning recipients. Of course they need to look closely at those numbers. And of course they want to see whether the company’s interpretation of its performance matches their own.

Problems of manipulation…

But to do this they might want to take that published information out of the report PDF and use it in different ways. For instance, they might want to use stats from the global market share chart – but only those for the Asia-Pacific region. They might want to do some trends analysis. Or they might want to create some what-if scenarios.

The trouble with static data in PDFs is that it’s, well, static. It’s a fixed image. If you want to use it to make a new graph or spreadsheet you have to re-key everything or paste it across one cell at a time.

Unless, that is, you have software that can automate the process. But this too is difficult. Suppose you’re a shareholder and you want to manipulate something like this:

Converting PDF to excel

You can easily see this table is divided into sections, that the number of columns changes from section to section. So could a piece of software see what you see, and reproduce it faithfully?

Possibly not. The algorithms needed to organise data like this are far from straightforward.

What’s more, because a graph or a table is in a PDF, Google/Bing won’t recognise it. A search engine may be able to scan the text in a PDF, but it won’t know the graph or table is there.   So if you’re an investor searching Company X’s website for a table showing shipment volume projections you may not find what you need, even though it’s right there in glorious multi-coloured columns. And if you can’t find it, you can’t manipulate it – even if you have a reliable means of doing it.

It’s not just company data that hard to manipulate. Analysts’ reports may present the same problems because they, too, are generally rendered as PDFs. Let’s say the analysts have projected stock prices three years into the future weighted against official economic forecasts – but you want to track those prices out by a further two years using the same weighting. Without the right software you’re going to have to re-key all the base data from the PDF before you can apply rules to each cell.

… and a solution

The software we’ve developed here at Zanran makes it much easier to extract data from PDFs and work with it. Is it complicated? Indeed it is – but only if you need to know how it works. (If you’d like an explanation read more here about PDF to Excel conversion.)

But if you don’t need to know what’s under the bonnet it’s very straightforward. It takes the data you need – data that’s bound by the clunky constraints of the PDF format – and releases it into a format that suits you: Excel. It understands and organises the source information as intuitively as you would yourself, and takes away all the hard work of transcription. It even extracts lines and background colours from the PDF and re-applies them to the data once everything has been moved across, to give you the look and feel of the original.

The result? If you’re an investor or a market-watcher or a financial consultant you can take published data and investigate it for yourself without having to jump through time-consuming hoops. You can open up new layers of meaning specific to your own needs and on which you can act – and act profitably.

