Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. 2003
DOI: 10.1109/icdar.2003.1227697
|View full text |Cite
|
Sign up to set email alerts
|

Automated detection and segmentation of table of contents page from document images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2005
2005
2020
2020

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 8 publications
0
6
0
Order By: Relevance
“…Similar with the research of Tsuruoka et al (2001) and Mandal et al (2003) in English documents, Sun et al (2004) made use of the rules of indentation in TOCs of Chinese books, and developed an algorithm to digitalize TOCs of Chinese books based on OCR technology and indentation analysis. Gao et al (2010) noticed the style consistence phenomenon of TOCs of Chinese books, and put forward a Chinese book TOC recognition method by detecting decorative elements in the TOC based on clustering techniques.…”
Section: Review Of Related Literaturementioning
confidence: 97%
“…Similar with the research of Tsuruoka et al (2001) and Mandal et al (2003) in English documents, Sun et al (2004) made use of the rules of indentation in TOCs of Chinese books, and developed an algorithm to digitalize TOCs of Chinese books based on OCR technology and indentation analysis. Gao et al (2010) noticed the style consistence phenomenon of TOCs of Chinese books, and put forward a Chinese book TOC recognition method by detecting decorative elements in the TOC based on clustering techniques.…”
Section: Review Of Related Literaturementioning
confidence: 97%
“…And according to the way the models are generated, those approaches can be classified into two types: rule-based and learning-based. For example, Mandal et al [1] proposed a method of detecting TOC pages in a document, relying on page number-related heuristics and working on page images. Tsuruoka et al [2] used the indentation and font size to extract structural elements such as chapters and sections in a book.…”
Section: Related Workmentioning
confidence: 99%
“…Previous works have mentioned very little on how to get the individual TOC entries from TOC pages or how to segment TOC pages into TOC entries. Only Mandal et al [1] proposed a method to process broken-in lines based on predefined TOC component models, which cannot adapt to various TOC styles. In this paper, we use clustering techniques to generate a matched TOC model based on "document intrinsic format consistency".…”
Section: Toc Parsingmentioning
confidence: 99%
“…Mandal et al [1] proposed to extract the TOC from the scanned documents. Their approach is primarily based on optical character recognition (OCR), page heuristics and related techniques.…”
Section: Related Workmentioning
confidence: 99%