A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

Choudhary, Amit; Rishi, Rahul; Ahlawat, Savita

doi:10.1016/j.procs.2013.05.056

Cited by 16 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text zones were determined by examining vertical lines by rules, white run pixels were used for segmentation and bounding box coordinates for each related component are used to compute the height of a certain character. Choudhary, Rishi & Ahlawat [16] proposed a method to segment character images from text containing images and good results are achieved which shows the strength of the proposed character detection and extraction technique.…”

Section: B Preprocessing 1) Binarizationmentioning

confidence: 96%

Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization

Ceniza¹,

Archival²,

Bongo³

2018

JOIG

View full text Add to dashboard Cite

Books and documents go through degradation overtime and post threats in the readability of the printed text. Degradations like stains can overlap with the text covering it or ink fading can cause the removal of the text altogether. Converting these texts into digital format can help preserve them. Optical Character Recognition (OCR) is used to transform them into digital text. And, with the increasing computing capability and digital imaging of today's smartphones. We can use them as a convenient tool to capture images of these document and do OCR directly. In this paper, we propose a mobile application that can recognize text in degraded document images using Tesseract as the OCR engine with Adaptive Document Image Binarization to improve the performance of the OCR engine in degraded documents images. The experimental results showed an average character accuracy of 93.17% and word accuracy of 85.82% across 8 degraded document images.

show abstract

Section: B Preprocessing 1) Binarizationmentioning

confidence: 96%

Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization

Ceniza¹,

Archival²,

Bongo³

2018

JOIG

View full text Add to dashboard Cite

show abstract

“…It calculates the average grey value for each pixel column then split every blank region in the middle, making it vulnerable to disconnected structure and touching characters. Recently, improved methods have been proposed but are only specific for single language [1,2,5,6,13,16,17,19,23]. Other researches exploit complex processing pipelines and hand-crafted rules to tackle multilingual cases [4,10,24,25].…”

Section: Related Workmentioning

confidence: 99%

“…One major reason for poor recognition accuracy in OCR system is the error in character segmentation. Some previous researches [1,2,5,6,13,16,17,19,23] achieve high performance on monolingual texts, but rely on feature engineering specific to single character style. Other researches [4,10,24,25] work on multilingual cases but introduce complex processing pipelines.…”

Section: Introductionmentioning

confidence: 99%

Chinese/English mixed Character Segmentation as Semantic Segmentation

Zheng¹,

Wang²,

Huang³

et al. 2016

Preprint

View full text Add to dashboard Cite

OCR character segmentation for multilingual printed documents is difficult due to the diversity of different linguistic characters. Previous approaches mainly focus on monolingual texts and are not suitable for multilinguallingual cases. In this work, we particularly tackle the Chinese/English mixed case by reframing it as a semantic segmentation problem. We take advantage of the successful architecture called fully convolutional networks (FCN) in the field of semantic segmentation. Given a wide enough receptive field, FCN can utilize the necessary context around a horizontal position to determinate whether this is a splitting point or not. As a deep neural architecture, FCN can automatically learn useful features from raw text line images. Although trained on synthesized samples with simulated random disturbance, our FCN model generalizes well to real-world samples. The experimental results show that our model significantly outperforms the previous methods. significantly outperform previous methods on Chi-

show abstract

“…A pruning process is proposed with some knowledge about characters as well as size constrains proposed in [9]. Amit Choudharya et al [10 ] proposed an approach contains several steps. These steps could be summarized as: noise removal (image enhancement), thresholding to turn to binary image in an inverted form, labeling process to segment character images, connected foreground object repeatedly extracted, insignificant and large objects dropped, and finally the remaining labels inverted and considered potential objects.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, it gained the attention of the researchers" community. The features presented by the community include [6][7][8][9][10][11][12][13][14][15][16] Classifiers, according to [17], are built based on similarity, probability or decision boundaries. Similarity classifiers use similarity metrics to measure the closeness to the class members or the class preset representative(s).…”

Section: Introductionmentioning

confidence: 99%

Hand-Writing Recognition Using Neural Micro-Classifiers Network

2018

Journal of the ACS Advances in Computer Science

View full text Add to dashboard Cite

In this study, a hand writing recognition methodology based on the neural binary micro-classifier network. The proposed methodology uses simple well known feature extraction methodology. The feature extraction used is the discrete cosine transformation low frequencies coefficients. The micro-classifier network is a deterministic four layers neural network, the four layers are: input, micro-classifier, counter, and output. The network provide confidence factor, and proper generalization is guaranteed. Also, the network allows incremental learning, and more natural than others. The recognition methodology was tested using the standard MNIST dataset. The experimental results of the methodology showed comparative performance taking in consideration the design advantages.

show abstract

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

Cited by 16 publications

References 6 publications

Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization

Mobile Application for Recognizing Text in Degraded Document Images Using Optical Character Recognition with Adaptive Document Image Binarization

Chinese/English mixed Character Segmentation as Semantic Segmentation

Hand-Writing Recognition Using Neural Micro-Classifiers Network

Contact Info

Product

Resources

About