In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. Due to the unavailability of the dataset for different languages, it is complex to train the recognition system. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. They are pre-processing, segmentation and classification. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Then, for testing and training, the data are transformed into CSV file format (for converting the information to numbers). Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR values achived by the CNN are 0.981 and0.189.
Optical character recognition (OCR) technologies have made significant progress in the field of language recognition. Gujarati is a more difficult language to recognize compared to other languages because of curves, close loops, the inclusion of modifiers, and the presence of joint characters. So great effort has been laid into the literature for Gujarati OCR. Recently deep learning-based CNN models are applied to develop OCR for different languages but Convolutional Neural Networks (CNN) models are not yet giving a satisfactory performance to recognize Gujarati characters. So, this paper proposes a revolutionary Gujarati printed characters and numerals recognition CNN models. CNN-PGC (CNN for - Printed Gujarati Character) and CNN-HGC (CNN for - Handwritten Gujarati Character) are two optimally configured Convolutional Neural Networks (CNNs) presented in this research for printed Gujarati base characters and handwritten numbers, respectively. Concerning particular performance indicators, the suggested work's performance is evaluated and proven against that of other traditional models and with the latest baseline methods. Experimental analysis has been carried out on well-segmented newly generated Gujarati base characters and numerals dataset which includes 36 consonants, 13 vowels, and 10 handwritten numerals. Variation in the database is also taken into consideration during experiments like size, skew, noise blue, etc. Even in the presence of printing irregularities, writing irregularities, and degradations the proposed method achieves a 98.08% recognition rate for print characters and a 95.24 % recognition rate for handwritten numerals which is better than other existing models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.