Eighth International Conference on Document Analysis and Recognition (ICDAR'05) 2005
DOI: 10.1109/icdar.2005.215
|View full text |Cite
|
Sign up to set email alerts
|

Semantics-based content extraction in typewritten historical documents

Abstract: This paper presents a flexible approach to extracting content from scanned historical documents using semantic information.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
25
0
7

Year Published

2006
2006
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(32 citation statements)
references
References 5 publications
0
25
0
7
Order By: Relevance
“…The determination of word breaks is made in a manner that adapts to the writing style of the individual. For the case of historical machineprinted documents, Antonacopoulos and Karatzas [7] calculate and analyze the horizontal projection profile to identify suitable spaces between words. In the work of Gatos et al [8], word segmentation in historical machine-printed documents is based on a run length smoothing in the horizontal and vertical directions.…”
Section: Introductionmentioning
confidence: 99%
“…The determination of word breaks is made in a manner that adapts to the writing style of the individual. For the case of historical machineprinted documents, Antonacopoulos and Karatzas [7] calculate and analyze the horizontal projection profile to identify suitable spaces between words. In the work of Gatos et al [8], word segmentation in historical machine-printed documents is based on a run length smoothing in the horizontal and vertical directions.…”
Section: Introductionmentioning
confidence: 99%
“…The authors, with other colleagues in their laboratory, have implemented and experimented with various such thresholding techniques [5]. That work demonstrated that the indiscriminate application of any thresholding approach (global or local) does not yield as good results as when a method is applied only to the segmented text.…”
Section: Introductionmentioning
confidence: 99%
“…Features based upon a convex hull are insensitive to character fonts and sizes, the touching-character problem of various fonts and sizes can be handled even for heavily touching characters or italic-type overlapping characters without slant correction. Table 1 summarizes the characteristics of those approaches [1,8,10,14,4] mentioned above.…”
Section: Introductionmentioning
confidence: 99%
“…The most known of these segmentation algorithms are the following: projection analysis, connected component analysis, Run Length Smoothing Algorithm (RLSA), contour shape analysis and Hough transform. Representative examples of character segmentation methodologies are the following: Antonacopoulos and Karatzas [1] use the horizontal projection profile of each word segment for character segmentation in historical machine-printed documents. This approach cannot handle the case of overlapping characters.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation