More than 66 million people in India speak Telugu, a language that dates back thousands of years and is widely spoken in South India. There has not been much progress reported on the advancement of Telugu text Optical Character Recognition (OCR) systems. Telugu characters can be composed of many symbols joined together. OCR is the process of turning a document image into a text-editable one that may be used in other applications. It saves a great deal of time and effort by not having to start from scratch each time. There are hundreds of thousands of different combinations of modifiers and consonants when writing compound letters. Symbols joined to one another form a compound character. Since there are so many output classes in Telugu, there’s a lot of interclass variation. Additionally, there are not any Telugu OCR systems that take use of recent breakthroughs in deep learning, which prompted us to create our own. When used in conjunction with a word processor, an OCR system has a significant impact on real-world applications. In a Telugu OCR system, we offer two ways to improve symbol or glyph segmentation. When it comes to Telugu OCR, the ability to recognise that Telugu text is crucial. In a picture, connected components are collections of identical pixels that are connected to one another by either 4- or 8-pixel connectivity. These connected components are known as glyphs in Telugu. In the proposed research, an efficient deep learning model with Interrelated Tagging Prototype with Segmentation for Telugu Text Recognition (ITP-STTR) is introduced. The proposed model is compared with the existing model and the results exhibit that the proposed model’s performance in text recognition is high.
Images play an essential function in the electronic media to share information. Nowadays, each event is going to be recorded in the arrangement of digital images. Text from the image file won't be in a format on the computer. OCR (Optical Character Recognition) for English vocabulary is well constructed. Currently, there's a requirement of OCR for Indian languages to maintain historical documents composed mainly in Indian languages to arrange publications in the library and for program form processing. OCR for the Telugu language is challenging as consonants and vowels plays a vital role in forming words along with vattus and gunithas. It may be a mixture of vowels and consonants that may form a compound character. This paper presents research on methods utilized in the OCR method for the Telugu Language until today.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.