An efficient parametrization of character degradation model for semi-synthetic image generation

Kieu, van Cuong; Visani, Muriel; Journet, Nicholas; Mullot, Rémy; Domenger, Jean‐Philippe

doi:10.1145/2501115.2501127

Cited by 8 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dataset was increased to 500 images by adding noise using the degradation model discussed in [18]. Some sample images are shown in Figure 2(a) & (b).…”

Section: Experimental Evaluationsmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment

Garg

Chaudhury

2016

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

View full text Add to dashboard Cite

Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the preprocessing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.

show abstract

“…The dataset was increased to 500 images by adding noise using the degradation model discussed in [18]. Some sample images are shown in Figure 2(a) & (b).…”

Section: Experimental Evaluationsmentioning

confidence: 99%

“…2) Hindi Dataset: The Hindi dataset contains a total of 470 text paragraphs extracted from grayscale document images [21]. In order to introduce variations in terms of noise and degradation, we add noise as cuts and merge to images sing the degradation model discussed in [18]. Some sample images are shown in Figure 2(c) & (d).…”

Section: Experimental Evaluationsmentioning

confidence: 99%

Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment

Garg

Chaudhury

2016

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

View full text Add to dashboard Cite

show abstract

“…Such methods can roughly be classified into three types [12]. They are (i) adding noise, [10] (ii) degrading characters, [13] and (iii) distorting the shape of document images [12]. Kieu et.…”

Section: Introductionmentioning

confidence: 99%

A Method to Generate Synthetically Warped Document Image

Garai

Biswas

Mandal

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

The digital camera captured document images may often be warped and distorted due to different camera angles or document surfaces. A robust technique is needed to solve this kind of distortion. The research on dewarping of the document suffers due to the limited availability of benchmark public dataset. In recent times, deep learning based approaches are used to solve the problems accurately. To train most of the deep neural networks a large number of document images is required and generating such a large volume of document images manually is difficult. In this paper, we propose a technique to generate a synthetic warped image from a flat-bedded scanned document image. It is done by calculating warping factors for each pixel position using two warping position parameters (WPP) and eight warping control parameters (WCP). These parameters can be specified as needed depending upon the desired warping. The results are compared with similar real captured images both qualitative and quantitative way.

show abstract

“…Using synthetic data or synthetically degraded data has many advantages over human supervision including rapid generation of datasets at lower cost, control of degradation level, and fit testing of the same underlying document content with different corruption methods [Baird 2007;Kieu, Visani, Journet, Mullot, et al 2013;Varga et al 2003;Zi et al 2004]. The main idea is to take a clean image as the ground truth and apply several distortions and noise on top of it.…”

Section: Noising Methodsmentioning

confidence: 99%

“…In particular, DIB evaluation is usually computed at pixel level (Section 4.4), and it requires an accurate ground truth, with the inherent complexity of data supervision at this detail level. Generating synthetic data for training and evaluating document image processing systems is a topic that has been widely addressed in recent years [Baird 2007;Kieu, Visani, Journet, Mullot, et al 2013;Varga et al 2003;Zi et al 2004]. To overcome this issue, there are several techniques to generate useful ground truths.…”

Section: Ground Truth Generationmentioning

confidence: 99%

Neural Networks for Document Image and Text Processing

Pellicer¹

View full text Add to dashboard Cite

An efficient parametrization of character degradation model for semi-synthetic image generation

Cited by 8 publications

References 9 publications

Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment

Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment

A Method to Generate Synthetically Warped Document Image

Neural Networks for Document Image and Text Processing

Contact Info

Product

Resources

About