2013 12th International Conference on Document Analysis and Recognition 2013
DOI: 10.1109/icdar.2013.290
|View full text |Cite
|
Sign up to set email alerts
|

ICDAR 2013 Competition on Book Structure Extraction

Abstract: This paper summarizes the 3rd Book Structure Extraction competition that was run at the ICDAR 2013. Its goal is to evaluate and compare automatic techniques for deriving structure information from digitized books, which could then be used to aid navigation inside the books. More specifically, the task that participants are faced with is to construct hyperlinked tables of contents for a collection of 1,000 digitized books. This paper reviews the setup of the competition, the book collection used in the task, an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0
2

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 15 publications
(18 reference statements)
0
20
0
2
Order By: Relevance
“…Recent algorithms have explored TOC extraction by parsing TOC pages and extract the hierarchical structure of sections and subsections. Most methods in this area have been developed in the context of the INEX [20] and ICDAR competitions [21][22][23] which, as we have mentioned before, focus on long and old digitised historical books, as opposed to short scientific articles with previous methods. To the best of our knowledge, the only work led outside these competitions on the topic of TOC page parsing is [24,25], who apply a rule-based approach to PDF document layout analysis.…”
Section: Toc Extraction Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent algorithms have explored TOC extraction by parsing TOC pages and extract the hierarchical structure of sections and subsections. Most methods in this area have been developed in the context of the INEX [20] and ICDAR competitions [21][22][23] which, as we have mentioned before, focus on long and old digitised historical books, as opposed to short scientific articles with previous methods. To the best of our knowledge, the only work led outside these competitions on the topic of TOC page parsing is [24,25], who apply a rule-based approach to PDF document layout analysis.…”
Section: Toc Extraction Methodsmentioning
confidence: 99%
“…Lastly, a number of methods have been proposed to detect titles using machine learning methods based on layout and text features. In such approaches, the list of titles are hierarchically ordered according to a predefined rule-based function [21,26,27].…”
Section: Toc Extraction Methodsmentioning
confidence: 99%
“…Several approaches are meant to address the extraction of books' ToCs. They can be classified into 3 types, including approaches based on the detection of ToC pages, on the whole book content, and hybrid ones [4].…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we present an approach based on the aggregation of the existing approaches. We utilise the combination of two set operators (the union and the intersection) and two properties (title and page number) to aggregate submissions of the ICDAR book structure extraction competitions in 2009 [2], 2011 [3], and 2013 [4]. Our method is evaluated by the title-based and link-based measures over three book structure extraction competitions' datasets.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to these task, the Structure Extraction (SE) task ran at ICDAR 2013 [3], with the aim of evaluating automatic techniques for deriving structure from OCR and building hyperlinked table of contents. The extracted structure could then be used to aid navigation inside the books.…”
Section: Aims and Tasksmentioning
confidence: 99%