Abstract:Abstract. This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and… Show more
“…In Ref. [17], we proposed to first remove impulse noise by a size filtering process where the filtering threshold depends heavily on the image resolution. Here, we instead suppress impulse noise by using a center-weighted median filter [19].…”
Section: Document Image Preprocessingmentioning
confidence: 99%
“…In this paper, we adopt the word shape coding scheme reported in our earlier work [17,18] and use it for the document vector construction. Two word shape features are utilized including the character extremum points and the number of horizontal word cuts illustrated in Fig.…”
Section: Word Shape Codingmentioning
confidence: 99%
“…Its number of horizontal word cuts 11 may be ambiguously interpreted as converted from extremum points over character descenders. Besides, instead of searching for the vector element one by one as done in [17,18], the element within a document vector is arranged in a descending order in term of the word frequency so that a newly converted word shape code may locate the matched vector element as soon as possible.…”
Section: Document Vector Constructionmentioning
confidence: 99%
“…[17,18], we report a language filtering identification technique by using a word shape coding scheme, which converts each document image into an electronic document vector that captures the contents of image documents efficiently. In this paper, we adopt that word shape coding scheme and use it for document image retrieval.…”
“…In Ref. [17], we proposed to first remove impulse noise by a size filtering process where the filtering threshold depends heavily on the image resolution. Here, we instead suppress impulse noise by using a center-weighted median filter [19].…”
Section: Document Image Preprocessingmentioning
confidence: 99%
“…In this paper, we adopt the word shape coding scheme reported in our earlier work [17,18] and use it for the document vector construction. Two word shape features are utilized including the character extremum points and the number of horizontal word cuts illustrated in Fig.…”
Section: Word Shape Codingmentioning
confidence: 99%
“…Its number of horizontal word cuts 11 may be ambiguously interpreted as converted from extremum points over character descenders. Besides, instead of searching for the vector element one by one as done in [17,18], the element within a document vector is arranged in a descending order in term of the word frequency so that a newly converted word shape code may locate the matched vector element as soon as possible.…”
Section: Document Vector Constructionmentioning
confidence: 99%
“…[17,18], we report a language filtering identification technique by using a word shape coding scheme, which converts each document image into an electronic document vector that captures the contents of image documents efficiently. In this paper, we adopt that word shape coding scheme and use it for document image retrieval.…”
“…[9][16] [1]. in such environment the large volume of data and variety of scripts makes such manual identification unworkable [9] [16]. In such cases the ability to automatically determine the script, and further, the language of a document, would reduce the time and cost of document handling.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.