Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries 2018
DOI: 10.1145/3197026.3197040
|View full text |Cite
|
Sign up to set email alerts
|

Extracting Scientific Figures with Distantly Supervised Neural Networks

Abstract: Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of datadriven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention. To accomplish this we leverage the auxiliary data provided in two large web collections of scientific documents (arXiv a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
83
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 109 publications
(83 citation statements)
references
References 22 publications
0
83
0
Order By: Relevance
“…We evaluate our approaches by comparing them to the approaches from the related work. Specifically, we compare our approaches against PDFFigures [3], PDFFigures2 [4] and DeepFigures [5]. The results of our extraction pipeline show that our best approach automatically extracts figures with a precision of 0.73 and a recall of 0.80.…”
Section: Document Element Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…We evaluate our approaches by comparing them to the approaches from the related work. Specifically, we compare our approaches against PDFFigures [3], PDFFigures2 [4] and DeepFigures [5]. The results of our extraction pipeline show that our best approach automatically extracts figures with a precision of 0.73 and a recall of 0.80.…”
Section: Document Element Recognitionmentioning
confidence: 99%
“…Furthermore, a series of systems called PDFFigures [3], PDFFigures2 [4], which were developed by Clark et al as well as DeepFigures [5] that was developed by Siegel et al were developed for the inclusion in the Semantic Scholar search engine (https://www.semanticscholar.org) and discussed in the literature. In the following, we refer to these systems as the PDFFigures systems.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Being able to automatically identify and decode mathematics (Lin et al, 2011;Wang and Liu, 2017a,b) in PDF files will enable a wide range of high-level applications such as information retrieval, machine reading, similarity analysis, information aggregation, and reasoning. Siegel et al (2018) discuss how to recover the positional information of figures in PDF files. The proposed methods could be also used for the alignment of MEs in PDF and XML files.…”
Section: A3 Action-graphs From Real Annotated Graphsmentioning
confidence: 99%
“…There is also an ongoing work on constructing knowledge graph from the scientific literature. Sinha et al (2015) builds a heterogeneous graph consisting of six types of entities: field of study, author, institution (the affiliation of the author), paper, venue (journal and conference series) and event Ammar et al (2018). focussed on constructing literature graph consisting of papers, authors, entities nodes and various interactions between…”
mentioning
confidence: 99%