An offline recognition system for Arabic handwritten words is presented. The recognition system is based on a semi-continuous 1-dimensional HMM. From each binary word image normalization parameters were estimated. First height, length, and baseline skew are normalized, then features are collected using a sliding window approach. This paper presents these methods in more detail. Some parameters were modified and the consequent effect on the recognition results are discussed. Significant tests were performed using the new IFN/ENIT -database of handwritten Arabic words. The comprehensive database consists of 26459 Arabic words (Tunisian town/village names) handwritten by 411 different writers and is free for non-commercial research. In the performed tests we achieved maximal recognition rates of about 89% on a word level.
This paper describes the Arabic handwriting recognition competition held at ICDAR 2009. This third competition (the first was at ICDAR 2005 and the second at ICDAR 2007) again used the IfN/ENIT-database with Arabic handwritten Tunisian town names. Today, more than 82 research groups from universities, research centers, and industry are working with this database worldwide. This year, 7 groups with 17 systems were participating in the competition. The systems were tested on known data and on two data sets which are unknown to the participants. The systems were compared based on the most important characteristic: the recognition rate. Additionally, the relative speed of the different systems was compared. A short description of the participating groups, their systems, and the results achieved are finally presented.
A system for the automatic generation of synthetic databases for the development or evaluation ofArabic word or text recognition systems (Arabic OCR) is presented. The proposed system works without any scanning of printed papel: Firstly Arabic text has to be typeset using a standard typesetting system. Secondly a noise-free bitmap of the document and the corresponding ground truth (GT) is automatically generated. Finally, an image distortion can be superimposed to the character or word image to simulate the expected real world noise of the intended application. All necessary modules are presented together with some examples. Special problems caused by specijic features of Arabic, such as printing from right to left, many diacritical points, variation in the height of characters, and changes in the relative position to the writing line, are suggested. The synthetic data set was used to train and test a recognition system based on Hidden Markov Model (HMM), which was originally developed for German cursive script, for Arabic printed words. Recognition results with direrent synthetic data sets are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.