Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries 2016
DOI: 10.1145/2910896.2910904
|View full text |Cite
|
Sign up to set email alerts
|

PDFFigures 2.0

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
41
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 97 publications
(41 citation statements)
references
References 6 publications
0
41
0
Order By: Relevance
“…They also present a figure classification dataset namely "FigureSeer". Clark et al [22] present another method "PDFFigures 2.0" to parse and classify figures from PDF documents along with a new dataset. Siegel et al [23] present "DeepFigures" a deep neural method for detecting figures from PDF documents.…”
Section: Related Workmentioning
confidence: 99%
“…They also present a figure classification dataset namely "FigureSeer". Clark et al [22] present another method "PDFFigures 2.0" to parse and classify figures from PDF documents along with a new dataset. Siegel et al [23] present "DeepFigures" a deep neural method for detecting figures from PDF documents.…”
Section: Related Workmentioning
confidence: 99%
“…First, a variety of natural language processing (NLP) approaches has been proposed [3,[5][6][7][8]. Second, computer vision systems have been developed which extract information from figures and graphics [9,10]. An NLP-based tool that deals directly with SEMs is presented by Bong et al [1].…”
Section: Related Researchmentioning
confidence: 99%
“…These tools are capable of extracting captions, references and other literature meta information; however, they cannot recognize and extract whole figures or tables from a paper. Other researchers use handcrafted features or heuristics to segment different parts of a PDF file and leverage the information contained in figures and tables [10,13]. More recent approaches try to utilize deep learning techniques like CNNs and pixel-wise segmentation for this task [9,14].…”
Section: Related Researchmentioning
confidence: 99%
“…More recent methods, typically based on multiple domain-specific heuristic rules, have been developed for specific research areas, such as high-energy physics ( PDFPlotExtractor , Praczyk et al , 2013) and computer science ( pdffigures2 , Clark and Divvala, 2016). While these tools utilize clustering and classification for separating certain types of graphics, vector graphics are often incorrectly extracted due to the complex figure and document structure.…”
Section: Introductionmentioning
confidence: 99%