2001
DOI: 10.1007/pl00013573
|View full text |Cite
|
Sign up to set email alerts
|

Retrieving information from document images: problems and solutions

Abstract: An information retrieval system that captures both visual and textual contents from paper documents can derive maximal benefits from DAR techniques while demanding little human assistance to achieve its goals. This article discusses technical problems, along with solution methods, and their integration into a wellperforming system. The focus of the discussion is very difficult applications, for example, Chinese and Japanese documents. Solution methods are also highlighted, with the emphasis placed upon some ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2005
2005
2018
2018

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…Each component is labelled using a contour tracing technique; as polygons are fully determined by vectors, so too are components fully determined by their respective contours, and as such all component pixels can be found. High-order objects such as characters, textlines, and text regions need to be classified in order to effectively perform a task such as DAR [11], but the application of the technique to blotch detection in film sequences appears to be novel.…”
Section: A Linear-time Contour Tracing Component-labelling Algorithmmentioning
confidence: 99%
“…Each component is labelled using a contour tracing technique; as polygons are fully determined by vectors, so too are components fully determined by their respective contours, and as such all component pixels can be found. High-order objects such as characters, textlines, and text regions need to be classified in order to effectively perform a task such as DAR [11], but the application of the technique to blotch detection in film sequences appears to be novel.…”
Section: A Linear-time Contour Tracing Component-labelling Algorithmmentioning
confidence: 99%
“…2c). To solve this binarization problem, we propose a binarization method [4,5] that combines a global threshold method with a window-based method. A global threshold T is obtained using Otsu's binarization method [24].…”
Section: Second Stage: Image Binarizationmentioning
confidence: 99%
“…For this purpose, it is useful to estimate the stroke width of each character and set the window size slightly larger than that width. To estimate the stroke width, one can employ Hadamard multiresolution analysis [4,5]. Hadamard kernels of various scales are applied to each character pixel horizontally and vertically to obtain the strength of each character pixel.…”
Section: Second Stage: Image Binarizationmentioning
confidence: 99%
See 1 more Smart Citation
“…For this reason, usage of LLAH for documents in other languages has not been shown yet. If a technique depends on characteristics of a language, applying the technique to different languages sometimes causes difficult problems [2]. However, applying LLAH to other languages is beneficial for users of the languages.…”
Section: Introductionmentioning
confidence: 99%