Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539043
|View full text |Cite
|
Sign up to set email alerts
|

DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 38 publications
(28 citation statements)
references
References 11 publications
0
28
0
Order By: Relevance
“…The lack of generalizability is a known and pervasive problem in the field of document layout analysis [e.g. 6,13]. A change in not only publication type, but simply publication year can drastically lower the accuracy of page object extraction methods for models that are not explicitly trained on this type of document [10,12,14].…”
Section: The Problem Of Generalizabilitymentioning
confidence: 99%
See 4 more Smart Citations
“…The lack of generalizability is a known and pervasive problem in the field of document layout analysis [e.g. 6,13]. A change in not only publication type, but simply publication year can drastically lower the accuracy of page object extraction methods for models that are not explicitly trained on this type of document [10,12,14].…”
Section: The Problem Of Generalizabilitymentioning
confidence: 99%
“…are looking for on a document's page. As discussed in [14], When different annotators work on the same dataset, they can disagree on the class definitions leading to inconsistent data within the same dataset [13,20]. While there have been pushes to adopt a standardized methodology for defining page object classes [e.g.…”
Section: Inconsistent Page Object Definitionsmentioning
confidence: 99%
See 3 more Smart Citations