2006
DOI: 10.1007/11669487_21
|View full text |Cite
|
Sign up to set email alerts
|

Language Identification in Degraded and Distorted Document Images

Abstract: Abstract. This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 9 publications
0
10
0
Order By: Relevance
“…In Ref. [17], we proposed to first remove impulse noise by a size filtering process where the filtering threshold depends heavily on the image resolution. Here, we instead suppress impulse noise by using a center-weighted median filter [19].…”
Section: Document Image Preprocessingmentioning
confidence: 99%
See 3 more Smart Citations
“…In Ref. [17], we proposed to first remove impulse noise by a size filtering process where the filtering threshold depends heavily on the image resolution. Here, we instead suppress impulse noise by using a center-weighted median filter [19].…”
Section: Document Image Preprocessingmentioning
confidence: 99%
“…In this paper, we adopt the word shape coding scheme reported in our earlier work [17,18] and use it for the document vector construction. Two word shape features are utilized including the character extremum points and the number of horizontal word cuts illustrated in Fig.…”
Section: Word Shape Codingmentioning
confidence: 99%
See 2 more Smart Citations
“…[9][16] [1]. in such environment the large volume of data and variety of scripts makes such manual identification unworkable [9] [16]. In such cases the ability to automatically determine the script, and further, the language of a document, would reduce the time and cost of document handling.…”
Section: Introductionmentioning
confidence: 99%