Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2018
DOI: 10.1145/3219819.3219834
|View full text |Cite
|
Sign up to set email alerts
|

Corpus Conversion Service

Abstract: Over the past few decades, the amount of scientific articles and technical literature has increased exponentially in size. Consequently, there is a great need for systems that can ingest these documents at scale and make the contained knowledge discoverable. Unfortunately, both the format of these documents (e.g. the PDF format or bitmap images) as well as the presentation of the data (e.g. complex tables) make the extraction of qualitative and quantitive data extremely challenging. In this paper, we present a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 37 publications
(7 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…DeepSearch has been previously used in various publications such as [1][2][3][4]. These publications demonstrate the effectiveness of DeepSearch in extracting and analyzing text-based data.…”
Section: Appendix a Methodsmentioning
confidence: 91%
“…DeepSearch has been previously used in various publications such as [1][2][3][4]. These publications demonstrate the effectiveness of DeepSearch in extracting and analyzing text-based data.…”
Section: Appendix a Methodsmentioning
confidence: 91%
“…The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) (Zhang et al 2023;Li et al 2022;Huang et al 2022). On the other hand, deep learningbased language processing methods are applied on the native PDF content (generated by a single PDF printing command) (Auer et al 2022;Livathinos et al 2021;Staar et al 2018).…”
Section: Related Workmentioning
confidence: 99%
“…A PDF document provides identical representation on any device and any OS. PDF documents are the de facto standard electronic document, and Adobe has estimated that there were 2.5 trillion PDF documents in circulation [7]. Furthermore, PDF has the specification to validate the content integrity by using the digital signature [8].…”
Section: Related Research 21 Pdf and Htmlmentioning
confidence: 99%