Combining deep learning and language modeling for segmentation-free OCR from raw pixels

Rawls, Stephen; Cao, Huaigu; Sabir, Ekraam; Natarajan, Prem

doi:10.1109/asar.2017.8067772

Cited by 12 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another such Arabic OCR model was trained on DARPA corpus using stacked BLSTM which is connected with CTC loss function for predicting Arabic text sequences. Also, a language model was added to the technique at the time of prediction to enhance the output of the actual trained OCR [66].…”

Section: B Deep Learning For Ocrmentioning

confidence: 99%

“…A lower value of these measures represents a higher effectiveness of a technique and vice versa. The reason for the choice of these measures stems from the fact that these measures have been widely used to evaluate the effectiveness of OCR and speech recognition systems [61], [66], [59]. These measures are based on the Levenshtein distance [75] which measures similarity between two strings.…”

Section: A Performance Evaluation Measuresmentioning

confidence: 99%

“…This validates the fact that the trained models are not overfitting and they generalized at both word and text line levels. Also, it can be observed from the figure that WER is higher than CER in every case [61], [66]. It is due to the fact that WER considers words as a single unit and a word is marked as an error even if there is a variation of a single character in it.…”

Section: ) Mixed Fontmentioning

confidence: 99%

See 2 more Smart Citations

MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning

Nasir¹,

Malik

Shahzad

2021

IEEE Access

View full text Add to dashboard Cite

Optical Character Recognition (OCR) is a technique that generates text from an image. Recognizing the importance of OCR in real-world settings, a plethora of techniques have been developed for Western, as well as Asian languages. Urdu is a prominent South Asian language and a number of different solutions for Urdu OCR have been proposed. However, fewer attempts have been made to develop end-to-end deep learning-based solutions for recognizing printed Urdu text. Furthermore, several benchmark corpora for Urdu OCR have been developed that can be used for training and evaluation of different OCR techniques. However, there are a number of limitations of the existing Urdu corpora: firstly, most of them have either character or word or text images, which are usually rendered using only a single font, Nastaleeq. Secondly, the volume of the existing datasets is so small that it is not suitable for working with the deep-learning techniques that have achieved groundbreaking results for OCRs. To that end, in this study, we have proposed a very large Multi-level and Multi-script Urdu corpus (MMU-OCR-21). It is the largest-ever Urdu corpus of printed text that is effectively suitable to work with deep learning techniques. In total, the corpus is composed of over 602,472 images, including text-line and word images in three prominent fonts, and their respective ground truth. Also, we have performed experiments using multiple state-of-the-art deep learning techniques for text-line and word level images.

show abstract

Section: B Deep Learning For Ocrmentioning

confidence: 99%

Section: A Performance Evaluation Measuresmentioning

confidence: 99%

Section: ) Mixed Fontmentioning

confidence: 99%

See 1 more Smart Citation

MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning

Nasir¹,

Malik

Shahzad

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…While most methods focus on a single language, the multilingual setting was addressed in [18] via a new gated convolutional feature extractor. Note that, while convolutional extractors are the most common ones, fully-connected layers can also be employed, as demonstrated in [19]. In any event, the effectiveness of these techniques has been demonstrated on academic databases only, and these databases consist of standard text lines and paragraphs.…”

Section: A Contributionsmentioning

confidence: 99%

Field Typing for Improved Recognition on Heterogeneous Handwritten Forms

Tomoiaga¹,

Feng²,

Salzmann³

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Offline handwriting recognition has undergone continuous progress over the past decades. However, existing methods are typically benchmarked on free-form text datasets that are biased towards good-quality images and handwriting styles, and homogeneous content. In this paper, we show that state-of-the-art algorithms, employing long short-term memory (LSTM) layers, do not readily generalize to real-world structured documents, such as forms, due to their highly heterogeneous and out-ofvocabulary content, and to the inherent ambiguities of this content. To address this, we propose to leverage the content type within an LSTM-based architecture. Furthermore, we introduce a procedure to generate synthetic data to train this architecture without requiring expensive manual annotations. We demonstrate the effectiveness of our approach at transcribing text on a challenging, real-world dataset of European Accident Statements.

show abstract

“…For a reasonably well performing OCR model, we choose one that is very similar in structure to [24] but not finetuned like it. The model is segmentation free and the LSTM output does not require processing except for decoding.…”

Section: Modelmentioning

confidence: 99%

Implicit Language Model in LSTM for OCR

Sabir

Rawls

Natarajan

2017

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

Self Cite

View full text Add to dashboard Cite

Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our configuration). We believe that this is the first ever attempt at characterizing the strength of the implicit LM in LSTM based OCR systems.

show abstract

Combining deep learning and language modeling for segmentation-free OCR from raw pixels

Cited by 12 publications

References 16 publications

MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning

MMU-OCR-21: Towards End-to-End Urdu Text Recognition Using Deep Learning

Field Typing for Improved Recognition on Heterogeneous Handwritten Forms

Implicit Language Model in LSTM for OCR

Contact Info

Product

Resources

About