“…The extraction of different layout elements of articles is an important component of scientific data curation, with the accuracy of extraction of the elements such as tables, figures and their captions increasing significantly over the past several years [4,15,25,51]. A large field of study within document layout analysis is the "mining" of PDFs as newer PDFs are generally in "vector" format -the document is rendered from a set of instructions instead of pixel-by-pixel as in a raster format, and, in theory, the set of instructions can be parsed to determine the locations of figures, captions and tables [3,9,23].…”