Implicit Language Model in LSTM for OCR

Sabir, Ekraam; Rawls, Stephen; Natarajan, Prem

doi:10.1109/icdar.2017.361

Cited by 34 publications

(24 citation statements)

References 19 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transkribus [34,35] is another complex platform for analysis of historical documents which covers many research areas such as layout analysis and handwritten text recognition. It includes also OCR using ABBYY Finereader Engine 11 6 . To the best of our knowledge, Tesseract and Transcribus are the best performing OCR systems.…”

Section: Existing Tools and Ocr Systemsmentioning

confidence: 99%

“…Language models are often used in the OCR field to correct recognition errors [41]. Sabir et al [6] showed that LSTMbased models are able to learn language model implicitly. This LM is trained during the learning of the whole network.…”

Section: Impact Of the Implicit Language Modelmentioning

confidence: 99%

“…Hence, our OCR method is also based on the combination of convolutional and recurrent neural networks. A natural ability of RNNs is to learn an implicit language model (LM) [6][7][8]. To be able to learn the LM, we must provide the network with a sufficient amount of meaningful text in the domain we work in.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Building an efficient OCR system for historical documents with little training data

Martínek

Lenc

Král

2020

Neural Comput & Applic

View full text Add to dashboard Cite

As the number of digitized historical documents has increased rapidly during the last a few decades, it is necessary to provide efficient methods of information retrieval and knowledge extraction to make the data accessible. Such methods are dependent on optical character recognition (OCR) which converts the document images into textual representations. Nowadays, OCR methods are often not adapted to the historical domain; moreover, they usually need a significant amount of annotated documents. Therefore, this paper introduces a set of methods that allows performing an OCR on historical document images using only a small amount of real, manually annotated training data. The presented complete OCR system includes two main tasks: page layout analysis including text block and line segmentation and OCR. Our segmentation methods are based on fully convolutional networks, and the OCR approach utilizes recurrent neural networks. Both approaches are state of the art in the relevant fields. We have created a novel real dataset for OCR from Porta fontium portal. This corpus is freely available for research, and all proposed methods are evaluated on these data. We show that both the segmentation and OCR tasks are feasible with only a few annotated real data samples. The experiments aim at determining the best way how to achieve good performance with the given small set of data. We also demonstrate that obtained scores are comparable or even better than the scores of several state-of-the-art systems. To sum up, this paper shows a way how to create an efficient OCR system for historical documents with a need for only a little annotated training data.

show abstract

Section: Existing Tools and Ocr Systemsmentioning

confidence: 99%

Section: Impact Of the Implicit Language Modelmentioning

confidence: 99%

See 1 more Smart Citation

Building an efficient OCR system for historical documents with little training data

Martínek

Lenc

Král

2020

Neural Comput & Applic

View full text Add to dashboard Cite

show abstract

“…They enable to share the contextual information between the locations and therefore enhance the performances. Nevertheless, it is known [22] that they can also learn some kind of language modeling.…”

Section: Impact Of Language Modelingmentioning

confidence: 99%

Are 2D-LSTM really dead for offline text recognition?

Moysset

Messina

2019

IJDAR

View full text Add to dashboard Cite

There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D, and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional only architectures. The most used type of recurrent layer is the Long-Short Term Memory (LSTM). The motivations to do so are many:there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized, and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable.Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous work that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the "baseline" 2D model compared to the challenger model, which was dully optimized.In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging "real-world" data, compared to "academic" datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to as-Bastien Moysset A2iA SA, sess if linguistic constraints do level the performance of the different networks.Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.

show abstract

“…This prevents explicit learning of query selection. However, it is known that recurrent networks can learn implicit tasks [28]. Second, query selection is a soft decision made by the model, whereas a discrete decision is preferred.…”

Section: Appendix a Recurrent Neural Network Architecturementioning

confidence: 99%

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Kartik

Sabir

Mitra³

et al. 2018

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Self Cite

View full text Add to dashboard Cite

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

show abstract

Implicit Language Model in LSTM for OCR

Cited by 34 publications

References 19 publications

Building an efficient OCR system for historical documents with little training data

Building an efficient OCR system for historical documents with little training data

Are 2D-LSTM really dead for offline text recognition?

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Contact Info

Product

Resources

About