Proceedings of Sixth International Conference on Document Analysis and Recognition
DOI: 10.1109/icdar.2001.953898
|View full text |Cite
|
Sign up to set email alerts
|

A complete OCR for printed Hindi text in Devanagari script

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 53 publications
(21 citation statements)
references
References 9 publications
0
20
0
Order By: Relevance
“…Bansal and Sinha [2] had proposed a segmentation technique for Devanagari script wherein word image is divided into top, core and bottom strips. Top strip is separated from core strip by a header line.…”
Section: Figure 5: Identification Of Break Locations On Imagesmentioning
confidence: 99%
“…Bansal and Sinha [2] had proposed a segmentation technique for Devanagari script wherein word image is divided into top, core and bottom strips. Top strip is separated from core strip by a header line.…”
Section: Figure 5: Identification Of Break Locations On Imagesmentioning
confidence: 99%
“…Research interest in Latin-based OCR faded away more than a decade ago, in favor of Chinese, Japanese, and Korean (CJK) [1,2], followed more recently by Arabic [3,4], and then Hindi [5,6]. These languages provide greater challenges specifically to classifiers, and also to the other components of OCR systems.…”
Section: Introductionmentioning
confidence: 99%
“…While Tesseract was originally developed for English, it has since been extended to recognize French, Italian, Catalan, Czech, Danish, Polish, Bulgarian, Russian, Greek, Korean, Spanish, Japanese, Dutch, Chinese, Indonesian, Swedish, German, Thai, Arabic, and Hindi etc. Training the Tesseract OCR Engine for Hindi language requires in-depth knowledge of Devnagari script in order to collect the character set [4]. Moreover, Tesseract OCR Engine does not just require training of the collected dataset but also to tackle the character segmentation and clubbing issues based on the script specific features [5] i.e.…”
Section: Introductionmentioning
confidence: 99%