2017
DOI: 10.1016/j.patcog.2016.10.023
|View full text |Cite
|
Sign up to set email alerts
|

A comprehensive survey of mostly textual document segmentation algorithms since 2008

Abstract: International audienceIn document image analysis, segmentation is the task that identifies the regions of a document. The increasing number of applications of document analysis requires a good knowledge of the available technologies. This survey highlights the variety of the approaches that have been proposed for document image segmentation since 2008. It provides a clear typology of documents and of document image segmentation algorithms. We also discuss the technical limitations of these algorithms, the way … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
36
0
3

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 79 publications
(39 citation statements)
references
References 121 publications
0
36
0
3
Order By: Relevance
“…In the past decades, researchers have proposed some methods to extract layout information from the printed or handwritten document images. Sébastien et al [9] divided these methods into three categories. The first category is usually aiming at segmenting a specific, predefined kind of layout such as a Manhattan layout for instance [10], [11].…”
Section: A Layout Segmentationmentioning
confidence: 99%
“…In the past decades, researchers have proposed some methods to extract layout information from the printed or handwritten document images. Sébastien et al [9] divided these methods into three categories. The first category is usually aiming at segmenting a specific, predefined kind of layout such as a Manhattan layout for instance [10], [11].…”
Section: A Layout Segmentationmentioning
confidence: 99%
“…In order to develop a comprehensive model which can be used on diversified publishing styles, we chose ESWC 2016 challenge task 2 published dataset. Various gold standard datasets from ESWC challenge are available at the link 13 along with an evaluation tool. This dataset consists of research articles having diversified format and styles adopted from publishers like ACM, LNCS, and IEEE.…”
Section: A Datasetmentioning
confidence: 99%
“…Eskenazi et al [9] surveyed and proposed a typology for most document segmentation algorithms between 2008 and 2017. According to this typology, PIVAJ's segmentation part would belong to group 3: Layout potentially unconstrained, subgroup hybrid techniques, as it combines methods of group 2's segmentation by feature classification (B&W for text and separators, grayscale for pictures) with another higher-level classifier for text level identification.…”
Section: Selection Of the Article Extraction Toolmentioning
confidence: 99%