Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing 2013
DOI: 10.1145/2501115.2501127
|View full text |Cite
|
Sign up to set email alerts
|

An efficient parametrization of character degradation model for semi-synthetic image generation

Abstract: This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations observed in real document images (ink splotches, white specks or streaks). Thanks to the ability of simulating different amount and kind of noise, it is possible to evaluate the robustness of many document image analysis methods. It also permits to generate data for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…The dataset was increased to 500 images by adding noise using the degradation model discussed in [18]. Some sample images are shown in Figure 2(a) & (b).…”
Section: Experimental Evaluationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The dataset was increased to 500 images by adding noise using the degradation model discussed in [18]. Some sample images are shown in Figure 2(a) & (b).…”
Section: Experimental Evaluationsmentioning
confidence: 99%
“…2) Hindi Dataset: The Hindi dataset contains a total of 470 text paragraphs extracted from grayscale document images [21]. In order to introduce variations in terms of noise and degradation, we add noise as cuts and merge to images sing the degradation model discussed in [18]. Some sample images are shown in Figure 2(c) & (d).…”
Section: Experimental Evaluationsmentioning
confidence: 99%
“…Such methods can roughly be classified into three types [12]. They are (i) adding noise, [10] (ii) degrading characters, [13] and (iii) distorting the shape of document images [12]. Kieu et.…”
Section: Introductionmentioning
confidence: 99%
“…Using synthetic data or synthetically degraded data has many advantages over human supervision including rapid generation of datasets at lower cost, control of degradation level, and fit testing of the same underlying document content with different corruption methods [Baird 2007;Kieu, Visani, Journet, Mullot, et al 2013;Varga et al 2003;Zi et al 2004]. The main idea is to take a clean image as the ground truth and apply several distortions and noise on top of it.…”
Section: Noising Methodsmentioning
confidence: 99%
“…In particular, DIB evaluation is usually computed at pixel level (Section 4.4), and it requires an accurate ground truth, with the inherent complexity of data supervision at this detail level. Generating synthetic data for training and evaluating document image processing systems is a topic that has been widely addressed in recent years [Baird 2007;Kieu, Visani, Journet, Mullot, et al 2013;Varga et al 2003;Zi et al 2004]. To overcome this issue, there are several techniques to generate useful ground truths.…”
Section: Ground Truth Generationmentioning
confidence: 99%