In recent years, Deep Learning models have become indispensable in several fields such as computer vision, automatic object recognition, and automatic natural language processing. The implementation of a robust and efficient handwritten text recognition system remains a challenge for the research community in this field, especially for the Arabic language, which, compared to other languages, has a dearth of published works. In this work, we presented an efficient and new system for offline Arabic handwritten text recognition. Our new approach is based on the combination of a Convolutional Neural Network (CNN) and a Bidirectional Long-Term Memory (BLSTM) followed by a Connectionist Temporal Classification layer (CTC). Moreover, during the training phase of the model, we introduce an algorithm of data augmentation to increase the quality of data. Our proposed approach can recognize Arabic handwritten texts without the need to segment the characters, thus overcoming several problems related to this point. To train and test (evaluate) our approach, we used two Arabic handwritten text recognition databases, which are IFN/ENIT and KHATT. The Experimental results show that our new approach, compared to other methods in the literature, gives better results.
Arabic language is rich and complex in terms of word morphology compared to other Latin languages. Recently, natural language processing (NLP) field emerges with many researches targeting Arabic language understanding (ALU). In this context, this work presents our developed approach based on the Arabic bidirectional encoder representations from transformers (AraBERT) model where the main required steps are presented in detail. We started by the input text pre-processing, which is, then, segmented using the Farasa segmentation technique. In the next step, the AraBERT model is implemented with the pertinent parameters. The performance of our approach has been evaluated using the ARev dataset which contains more than 40,000 comments-remarks records relate to the tourism sector such as hotel reviews, restaurant reviews and others. Moreover, the obtained results are deeply compared with other relevant states of the art methods, and it shows the competitiveness of our approach that gives important results that can serve as a guide for further improvements in this field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.