This article comprehensively surveys Arabic Online Handwriting Recognition (AOHR). We address the challenges posed by online handwriting recognition, including ligatures, dots and diacritic problems, online/offline touching of text, and geometric variations. Then we present a general model of an AOHR system that incorporates the different phases of an AOHR system. We summarize the main AOHR databases and identify their uses and limitations. Preprocessing techniques that are used in AOHR, viz. normalization, smoothing, de-hooking, baseline identification, and delayed stroke processing, are presented with illustrative examples. We discuss different techniques for Arabic online handwriting segmentation at the character and morpheme levels and identify their limitations. Feature extraction techniques that are used in AOHR are discussed and their challenges identified. We address the classification techniques of non-cursive (characters and digits) and cursive Arabic online handwriting and analyze their applications. We discuss different classification techniques, viz. structural approaches, Support Vector Machine (SVM), Fuzzy SVM, Neural Networks, Hidden Markov Model, Genetic algorithms, decision trees, and rule-based systems, and analyze their performance. Post-processing techniques are also discussed. Several tables that summarize the surveyed publications are provided for ease of reference and comparison. We summarize the current limitations and difficulties of AOHR and future directions of research.
Incompleteness is one of the problematic data quality challenges in real-world machine learning tasks. A large number of studies have been conducted for addressing this challenge. However, most of the existing studies focus on the classification task and only a limited number of studies for symbolic regression with missing values exist . In this work, a new imputation method for symbolic regression with incomplete data is proposed. The method aims to improve both the effectiveness and efficiency of imputing missing values for symbolic regression. This method is based on genetic programming (GP) and weighted K-nearest neighbors (KNN). It constructs GP-based models using other available features to predict the missing values of incomplete features. The instances used for constructing such models are selected using weighted KNN. The experimental results on real-world data sets show that the proposed method outperforms a number of state-of-the-art methods with respect to the imputation accuracy, the symbolic regression performance, and the imputation time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.