In pattern recognition, the handwritten character recognition (HCR) is considered as the classical challenge. Due to the unavailability of the dataset for different languages, it is complex to train the recognition system. In particular, the benchmark dataset for HCR in the Gujarati language is limited. To overcome this challenge, a proper dataset is required for experimentation. Hence, this work introduces dataset generation for the Gujarati language using pre-processing and classification techniques. Initially, the handwritten data is collected from various native Gujarati writers. In this work, there are three processes carried out to generate the dataset. They are pre-processing, segmentation and classification. Initially, the pre-processing stages like a selection of image, noise removal, normalization, conversion of integer value to double, grayscale image into a binary image, dimensionality reduction, and vector conversation are performed. Then, the pre-processed image is segmented using line segmentation, character segmentation and word segmentation. Then, for testing and training, the data are transformed into CSV file format (for converting the information to numbers). Finally, the data are classified using a Convolutional neural network (CNN). The kappa and FPR values achived by the CNN are 0.981 and0.189.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.