Synthetic data for Arabic OCR system development

Märgner, Volker; Pechwitz, Mario

doi:10.1109/icdar.2001.953967

Cited by 25 publications

(15 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some techniques may involve parameters which require expert knowledge for calibration while other parameters may be trained from the data available. Moreover, the number of parameters that need to be trained also places some constraint on the minimum data required to robustly train the model [54]. But sometimes, more parameters provide increased flexibility in deciding the desired quality and property of the synthesized text.…”

Section: Parameterizationmentioning

confidence: 99%

Handwriting synthesis: classifications and techniques

Elarian

Abdel-Aal

Ahmad

et al. 2014

IJDAR

View full text Add to dashboard Cite

Handwriting synthesis is the automatic generation of data that resemble natural handwriting. Although handwriting synthesis has recently gained increasing interest, the area still lacks a stand-alone review. This paper provides classifications for the different aspects of handwriting synthesis. It presents the applications, techniques, and evaluation methods for handwriting synthesis based on the several aspects that we identify. Then, it discusses various synthesis techniques. To the best of our knowledge, this paper is the only stand-alone survey on this topic, and we believe it can serve as a useful reference for the researchers in the field of handwriting synthesis.

show abstract

Section: Parameterizationmentioning

confidence: 99%

Handwriting synthesis: classifications and techniques

Elarian

Abdel-Aal

Ahmad

et al. 2014

IJDAR

View full text Add to dashboard Cite

show abstract

“…These values were randomly selected by a uniform distribution U(a, b), where (a, b) are the minimum and maximum values, as follows: A x = h x /U(20, 70) and A y = h y /U(20, 70) for the horizontal and vertical amplitude respectively and for the number of periods: N x = U(0. 1,2) and N y = U(0.1, 2). Finally, the new signature trajectory plan is obtained linking the new dots using Bresenham's line.…”

Section: Intra-stroke Variabilitymentioning

confidence: 99%

“…Generating duplicated synthetic handwriting samples of a person from a limited number of real samples has already been proposed [1], [2]. Focusing on handwriting signatures, different strategies have been used, which can be classified into two approaches:…”

Section: Introductionmentioning

confidence: 99%

Cognitive Inspired Model to Generate Duplicated Static Signature Images

Diaz

Ferrer

Morales

2014

2014 14th International Conference on Frontiers in Handwriting Recognition

View full text Add to dashboard Cite

The handwriting signature is one of the most popular behavioral biometric traits for person recognition. Such recognition systems capture the personal signing behaviour and its variability based on a limited number of enrolled signatures.In this paper a cognitive inspired model based on motor equivalence theory is developed to duplicate off-line signatures from one real on-line seed. This model achieves duplicated signatures with a natural variability. It is validated with an off-line signature verifier based on texture features and a SVM classifier. The results manifest the complementarity of the duplicated signatures and the utility of the model.

show abstract

“…In [12], synthetic data is generated and used for training a Hidden Markov Model (HMM) based Arabic OCR system. Symbolic ground truth is keyed in and formatted in a LATEX environment, while the noise free images are obtained from the DVI files.…”

Section: Synthetic Data Sets and Ground Truth Generationmentioning

confidence: 99%

Groundtruth Generation and Document Image Degradation

Zi¹

2005

View full text Add to dashboard Cite

The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness. Report Documentation PageForm Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

show abstract

Synthetic data for Arabic OCR system development

Cited by 25 publications

References 5 publications

Handwriting synthesis: classifications and techniques

Handwriting synthesis: classifications and techniques

Cognitive Inspired Model to Generate Duplicated Static Signature Images

Groundtruth Generation and Document Image Degradation

Contact Info

Product

Resources

About