Proceedings of Sixth International Conference on Document Analysis and Recognition
DOI: 10.1109/icdar.2001.953967
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic data for Arabic OCR system development

Abstract: A system for the automatic generation of synthetic databases for the development or evaluation ofArabic word or text recognition systems (Arabic OCR) is presented. The proposed system works without any scanning of printed papel: Firstly Arabic text has to be typeset using a standard typesetting system. Secondly a noise-free bitmap of the document and the corresponding ground truth (GT) is automatically generated. Finally, an image distortion can be superimposed to the character or word image to simulate the ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(15 citation statements)
references
References 5 publications
0
15
0
Order By: Relevance
“…Some techniques may involve parameters which require expert knowledge for calibration while other parameters may be trained from the data available. Moreover, the number of parameters that need to be trained also places some constraint on the minimum data required to robustly train the model [54]. But sometimes, more parameters provide increased flexibility in deciding the desired quality and property of the synthesized text.…”
Section: Parameterizationmentioning
confidence: 99%
“…Some techniques may involve parameters which require expert knowledge for calibration while other parameters may be trained from the data available. Moreover, the number of parameters that need to be trained also places some constraint on the minimum data required to robustly train the model [54]. But sometimes, more parameters provide increased flexibility in deciding the desired quality and property of the synthesized text.…”
Section: Parameterizationmentioning
confidence: 99%
“…These values were randomly selected by a uniform distribution U(a, b), where (a, b) are the minimum and maximum values, as follows: A x = h x /U(20, 70) and A y = h y /U(20, 70) for the horizontal and vertical amplitude respectively and for the number of periods: N x = U(0. 1,2) and N y = U(0.1, 2). Finally, the new signature trajectory plan is obtained linking the new dots using Bresenham's line.…”
Section: Intra-stroke Variabilitymentioning
confidence: 99%
“…Generating duplicated synthetic handwriting samples of a person from a limited number of real samples has already been proposed [1], [2]. Focusing on handwriting signatures, different strategies have been used, which can be classified into two approaches:…”
Section: Introductionmentioning
confidence: 99%
“…In [12], synthetic data is generated and used for training a Hidden Markov Model (HMM) based Arabic OCR system. Symbolic ground truth is keyed in and formatted in a LATEX environment, while the noise free images are obtained from the DVI files.…”
Section: Synthetic Data Sets and Ground Truth Generationmentioning
confidence: 99%