Language transliteration is one of the important areas in NLP. Transliteration is very useful for converting the named entities (NEs) written in one script to another script in NLP applications like Cross Lingual Information Retrieval (CLIR), Multilingual Voice Chat Applications and Real Time Machine Translation (MT). The most important requirement of Transliteration system is to preserve the phonetic properties of source language after the transliteration in target language. In this paper, we have proposed the named entity transliteration for Hindi to English and Marathi to English language pairs using Support Vector Machine (SVM). In the proposed approach, the source named entity is segmented into transliteration units; hence transliteration problem can be viewed as sequence labeling problem. The classification of phonetic units is done by using the polynomial kernel function of Support Vector Machine (SVM). Proposed approach uses phonetic of the source language and n-gram as two features for transliteration.
e-Governance and Web based online commercial multilingual applications has given utmost importance to the task of translation and transliteration. The Named Entities and Technical Terms occur in the source language of translation are called out of vocabulary words as they are not available in the multilingual corpus or dictionary used to support translation process. These Named Entities and Technical Terms need to be transliterated from source language to target language without losing their phonetic properties. The fundamental problem in India is that there is no set of rules available to write the spellings in English for Indian languages according to the linguistics. People are writing different spellings for the same name at different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the translation process. Major issue noticed by us is the transliteration of named entities consisting three syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to write the spelling either by orthographical approach or by phonological approach. In this paper authors have provided their opinion through experimentation about appropriateness of either approach.
Almost all transactions ranging from various domains such as travel, shopping, insurance, entertainment, hotels, appointments etc. are available through Internet based applications. Needless to say, all these applications require the knowledge of English. As Internet users are growing day by day, it is logical to say that, there is a great demand to develop tools and applications to support Indian languages for them. The solution to provide local language support in the web based commercial applications is Machine Translation which can be used to translate static labels on web form and Machine Transliteration to transliterate dynamic user inputs from local language into the default language English. It is challenging to transliterate names and technical terms occurring in the user input across languages with different alphabets and sound inventories. This paper focuses important issues which frequently occur in Hindi to English and Marathi to English named entities machine transliteration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.