Handwriting Recognition of Historical Documents with Few Labeled Data

Chammas, Edgard; Mokbel, Chafic; Likforman-Sulem, Laurence

doi:10.1109/das.2018.15

Cited by 35 publications

(21 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…See all the available details in [26]. The second row corresponds to a system that used 13 CNN layers and 3 layers of BLSTM, along with a word 2-gram for decoding [31]. The results of the here proposed baseline system are also shown in Table 9.…”

Section: Summary Of Results Obtained With the Icdar-2017 Datasetmentioning

confidence: 99%

A set of benchmarks for Handwritten Text Recognition on historical documents

Sánchez

Romero

Toselli

et al. 2019

Pattern Recognition

View full text Add to dashboard Cite

Section: Summary Of Results Obtained With the Icdar-2017 Datasetmentioning

confidence: 99%

A set of benchmarks for Handwritten Text Recognition on historical documents

Sánchez

Romero

Toselli

et al. 2019

Pattern Recognition

View full text Add to dashboard Cite

“…Similar to [32], in [36] they also apply some elastic distortions to the original images. In [4] the authors improve the performance by augmenting the training set with specially crafted multiscale data. They also propose a model-based normalization scheme that considers the variability in the writing scale at the recognition phase.…”

Section: Data Augmentationmentioning

confidence: 99%

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

2021

View full text Add to dashboard Cite

In this paper we address the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Our three main contributions are: first, we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need fine-tuning. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, we propose an algorithm to mitigate the effects of incorrect labeling in the training set. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6 percentage points in some cases) in the test set with little complexity overhead.INDEX TERMS connectionist temporal classification (CTC), convolutional neural networks (CNN), data augmentation (DA), deep neural networks (DNN), historical documents, long-short-term-memory (LSTM), offline handwriting text recognition (HTR), outlier detection; transfer learning.

show abstract

“…Recently, as part of the ICFHR 2018 READ competition [8], most of the participants proposed an optical model composed of a Convolutional Neural Network (CNN) and Bidirectional Long Short Term Memory (BLSTM) layers as in [15] and [16]. One of them was using Multi-dimensional LSTM (MDLSTM) [17], which has provided good performance when trained on a generic large data set, but has shown difficulty in carrying the writer adaptation process with few samples.…”

Section: A Optical Model Adaptationmentioning

confidence: 99%

Improving Text Recognition using Optical and Language Model Writer Adaptation

Soullard

Swaileh

Tranouez

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

State-of-the-art methods for handwriting text recognition are based on deep learning approaches and language modeling that require large data sets during training. In practice, there are some applications where the system processes mono-writer documents, and would thus benefit from being trained on examples from that writer. However, this is not common to have numerous examples coming from just one writer. In this paper, we propose an approach to adapt both the optical model and the language model to a particular writer, from a generic system trained on large data sets with a variety of examples. We show the benefits of the optical and language model writer adaptation. Our approach reaches competitive results on the READ 2018 data set, which is dedicated to model adaptation to particular writers.

show abstract

Handwriting Recognition of Historical Documents with Few Labeled Data

Cited by 35 publications

References 19 publications

A set of benchmarks for Handwritten Text Recognition on historical documents

A set of benchmarks for Handwritten Text Recognition on historical documents

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

Improving Text Recognition using Optical and Language Model Writer Adaptation

Contact Info

Product

Resources

About