English to Hindi machine transliteration system at NEWS 2009

Das, Amitava; Ekbal, Asif; Mondal, Tapabrata; Bandyopadhyay, Sivaji

doi:10.3115/1699705.1699726

Cited by 9 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quite a number of transliteration mechanisms have been proposed for some non-English European languages, Russian [6][7][8] and East Asian languages like Chinese [9][10][11][12], Japanese [13][14][15][16][17], Korean [18][19][20][21][22], West Asian languages like Arabic [23][24][25] and the Persian [26,27]. There have been some recent attempts on some Indian languages like Hindi [8,[28][29][30][31][32][33][34][35][36][37][38][39], Bengali [33,[40][41][42], Punjabi [43], Telugu [44], Kannada [29,45,46] and Tamil [29,31,47]. However, the present state-of-the-art of transliteration for Indian and other South Asian languages can be considered to be in the initial stage.…”

Section: *For Correspondencementioning

confidence: 99%

Machine transliteration and transliterated text retrieval: a survey

Prabhakar

Pal

2018

Sādhanā

View full text Add to dashboard Cite

Users of the WWW across the globe are increasing rapidly. According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English native speakers is quite high there. A large proportion of these non-English speakers access the Internet in their native languages but use the Roman script to express themselves through various communication channels like messages and posts. With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data. To leverage this huge information repository, there is a matching effort to process transliterated text. In this article, we survey the recent body of work in the field of transliteration. We start with a definition and discussion of the different types of transliteration followed by various deterministic and non-deterministic approaches used to tackle transliteration-related issues in machine translation and information retrieval. Finally, we study the performance of those techniques and present a comparative analysis of them.

show abstract

Section: *For Correspondencementioning

confidence: 99%

Machine transliteration and transliterated text retrieval: a survey

Prabhakar

Pal

2018

Sādhanā

View full text Add to dashboard Cite

show abstract

“…We also note that combination of several different models via re-ranking of their outputs (CRF, Maximum Entropy Model, Margin Infused Relaxed Algorithm) proves to be very successful (Oh et al, 2009); their system (reported as Team ID 6) produced the best or second-best transliteration performance consistently across all metrics, in all tasks, except Japanese back-transliteration. Examples of other model combinations are (Das et al, 2009). At least two teams (reported as Team IDs 14 and 27) incorporate language origin detection in their system (Bose and Sarkar, 2009;Khapra and Bhattacharyya, 2009).…”

Section: Standard Runsmentioning

confidence: 99%

Report of NEWS 2009 machine transliteration shared task

Kumaran

Pervouchine

et al. 2009

Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration - NEWS '09

View full text Add to dashboard Cite

This report documents the details of the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS), an ACL-IJCNLP 2009 workshop. The shared task features machine transliteration of proper names from English to a set of languages. This shared task has witnessed enthusiastic participation of 31 teams from all over the world, with diversity of participation for a given system and wide coverage for a given language pair (more than a dozen participants per language pair). Diverse transliteration methodologies are represented adequately in the shared task for a given language pair, thus underscoring the fact that the workshop may truly indicate the state of the art in machine transliteration in these language pairs. We measure and report 6 performance metrics on the submitted results. We believe that the shared task has successfully achieved the following objectives: (i) bringing together the community of researchers in the area of Machine Transliteration to focus on various research avenues, (ii) Calibrating systems on common corpora, using common metrics, thus creating a reasonable baseline for the state-of-the-art of transliteration systems, and (iii) providing a quantitative basis for meaningful comparison and analysis between various algorithmic approaches used in machine transliteration. We believe that the results of this shared task would uncover a host of interesting research problems, giving impetus to research in this significant research area. English to Hindi English to Tamil English to Kannada English to Russian English to ChineseEnglish to Korean English to Japanese Katakana

show abstract

“…CRF on the English to Korean transliteration and Hindi-English names respectively is suggested in [17]. A transliteration scheme that involved English to Hindi language pair from news 2009 transliteration task dataset is in [18]. The methodology incorporated English and Hindi contextual information for calculating the probabilities and chose the one which has a maximum probability and further improved the algorithm by applying postprocessing rules.…”

Section: Introductionmentioning

confidence: 99%

Hindi to English transliteration using multilayer gated recurrent units

Ansari

Ahmad

Beg

et al. 2022

IJEECS

View full text Add to dashboard Cite

Transliteration is <span lang="EN-US">the task of translating text from source script to target script provided that the language of the text remains the same. In this work, we perform transliteration on less explored Devanagari to Roman Hindi transliteration and its back transliteration. The neural transliteration model in this work is based on a sequence-to-sequence neural network that is composed of two major components, an encoder that transforms source language words into a meaningful representation and the decoder that is responsible for decoding the target language words. We utilize gated recurrent units (GRU) to design the multilayer encoder and decoder network. Among the several models, the multilayer model shows the best performance in terms of coupon equivalent rate (CER) and word error rate (WER). The method generates quite satisfactory predictions in Hindi-English bilingual machine transliteration with WER of 64.8% and CER of 20.1% which is a significant improvement over existing methods.</span>

show abstract

English to Hindi machine transliteration system at NEWS 2009

Cited by 9 publications

References 11 publications

Machine transliteration and transliterated text retrieval: a survey

Machine transliteration and transliterated text retrieval: a survey

Report of NEWS 2009 machine transliteration shared task

Hindi to English transliteration using multilayer gated recurrent units

Contact Info

Product

Resources

About