Optical Character Recognition (OCR) technology in converting an image containing text to an editable text format is of high sense in document image processing. Input to OCR could be a scanned document, or a simple newspaper cutout. Supervised Learning using Neural Networks yield the output with greater accuracy. Unlike English, Kannada Language has a huge set of characters as it includes kaagunithas, vattaksharas, etc. This makes recognition of the characters much more complex. The paper mainly concentrates on OCR for the Kannada Text which goes through a threshold as a first step converting input image into binary image, making segmentation easier. Characters can be extracted from the documents using various Segmentation methods. The vattaksharas are extracted/differentiated from the words by using base-line technique. When the characters are recognized, they are compared with Unicodes available on the system and then printed. In the above method, CNN plays a pivotal role in reading the character and comparing it with the Unicode look up table values to print the output. This system has been tested with varying fonts. A total number of 37 sample documents are used for experimentation. The system has been developed for only printed Kannada Text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.