Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing 2013
DOI: 10.1145/2501115.2501123
|View full text |Cite
|
Sign up to set email alerts
|

Generation of learning samples for historical handwriting recognition using image degradation

Abstract: Historical documents pose challenging problems for training handwriting recognition systems. Besides the high variability of character shapes inherent to all handwriting, the image quality can also differ greatly, for instance due to faded ink, ink bleed-through, wrinkled and stained parchment. Especially when only few learning samples are available, it is difficult to incorporate this variability in the morphological character models. In this paper, we investigate the use of image degradation to generate synt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 23 publications
(28 reference statements)
1
9
0
Order By: Relevance
“…The 1.524 images from the second dataset have been created using the 127 original images and transformed using our 3D distortion model. The tests presented in [39,59] confirm the conclusion of [60] about the impact of the degradation level on re-training, either for a task of character recognition or layout extraction.…”
Section: Document Image Generation For Retraining Tasksupporting
confidence: 80%
See 2 more Smart Citations
“…The 1.524 images from the second dataset have been created using the 127 original images and transformed using our 3D distortion model. The tests presented in [39,59] confirm the conclusion of [60] about the impact of the degradation level on re-training, either for a task of character recognition or layout extraction.…”
Section: Document Image Generation For Retraining Tasksupporting
confidence: 80%
“…The DocCreator ability to create synthetic documents that mimic real ones is effective for typewritten and handwritten characters (as long as the characters are apart from one another). Images created with DocCreator have already been used in many DIAR contexts: text/background/image pixel classification [36]; staff removal [13,37,38]; and handwritten character recognition [39]. In this article we present how DocCreator can be useful to enhance a binarization algorithm and for OCR performance prediction.…”
Section: Algorithms For Synthetic Data Augmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Generating synthetic data for the training and evaluation of document image processing systems has been widely addressed in recent years [6,20,21,22,23,24]. In particular, image binarization evaluation is usually computed at pixel level, requiring an accurate groundtruth, with the inner complexity of data supervision at this detail level.…”
Section: Image Processing and Groundtruth Generationmentioning
confidence: 99%
“…Fischer et al [6] propose a method to generate training samples for historical handwriting recognition. Three degradation models are applied on binary images: Kanungo [3], character degradation from [5] and geometric distortion from the evaluation of [7].…”
Section: Related Workmentioning
confidence: 99%