Improving Text Recognition using Optical and Language Model Writer Adaptation

Soullard, Yann; Swaileh, Wassim; Tranouez, Pierrick; Paquet, Thierry; Chatelain, Clément

doi:10.1109/icdar.2019.00190

Cited by 14 publications

(11 citation statements)

References 27 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the HTR problem with a reduced training set, TL was applied by Soullard et al in [7]. The main idea behind TL is initializing the parameters of a model by those learned from a huge dataset beforehand, denoted as source.…”

Section: B Transfer Learningmentioning

confidence: 99%

“…Hence, with TL we start learning a different task to avoid learning the whole set of parameters from scratch, preventing overfitting and favoring convergence. In [7], they proposed a method that applies TL in both the optical and the language model. In this and other similar previous proposals on TL, the authors applied DA in both training and test steps.…”

Section: B Transfer Learningmentioning

confidence: 99%

See 1 more Smart Citation

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

2021

View full text Add to dashboard Cite

In this paper we address the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Our three main contributions are: first, we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need fine-tuning. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, we propose an algorithm to mitigate the effects of incorrect labeling in the training set. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6 percentage points in some cases) in the test set with little complexity overhead.INDEX TERMS connectionist temporal classification (CTC), convolutional neural networks (CNN), data augmentation (DA), deep neural networks (DNN), historical documents, long-short-term-memory (LSTM), offline handwriting text recognition (HTR), outlier detection; transfer learning.

show abstract

Section: B Transfer Learningmentioning

confidence: 99%

Section: B Transfer Learningmentioning

confidence: 99%

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

2021

View full text Add to dashboard Cite

show abstract

“…For big historical datasets, in [5], the authors demonstrated the benefits of carefully designed data augmentation. Another strategy is to apply transfer learning [12,13,25,15,1], i.e., pretraining the HTR model on a big HTR dataset and fine-tuning it on the small training set of the dataset of interest. For HTR on small single-writer historical datasets, pretraining plus fine-tuning has been proven to be a more effective strategy than data augmentation [1].…”

Section: Related Workmentioning

confidence: 99%

Learning to Read L’Infinito: Handwritten Text Recognition with Synthetic Training Data

Cascianelli

Cornia

Baraldi

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

“…Also, classification and indexing of transcript text can be easily automated. Handwritten text recognition (HTR) tasks in historical datasets have been faced by many authors in the last few years [1][2][3][4][5][6][7][8][9][10]. In HTR, transcribing each author can be considered a different task, since the distribution of both model input and output varies from writer to writer.…”

Section: Introductionmentioning

confidence: 99%

Boosting offline handwritten text recognition in historical documents with few labeled lines

Aradillas,

Murillo-Fuentes,

Olmos

2020

Preprint

View full text Add to dashboard Cite

In this paper, we face the problem of offline handwritten text recognition (HTR) in historical documents when few labeled samples are available and some of them contain errors in the train set. Three main contributions are developed. First we analyze how to perform transfer learning (TL) from a massive database to a smaller historical database, analyzing which layers of the model need a finetuning process. Second, we analyze methods to efficiently combine TL and data augmentation (DA). Finally, an algorithm to mitigate the effects of incorrect labelings in the training set is proposed. The methods are analyzed over the ICFHR 2018 competition database, Washington and Parzival. Combining all these techniques, we demonstrate a remarkable reduction of CER (up to 6% in some cases) in the test set with little complexity overhead.

show abstract

Improving Text Recognition using Optical and Language Model Writer Adaptation

Cited by 14 publications

References 27 publications

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

Boosting Offline Handwritten Text Recognition in Historical Documents With Few Labeled Lines

Learning to Read L’Infinito: Handwritten Text Recognition with Synthetic Training Data

Boosting offline handwritten text recognition in historical documents with few labeled lines

Contact Info

Product

Resources

About