2010
DOI: 10.1145/1838751.1838753
|View full text |Cite
|
Sign up to set email alerts
|

Transliteration for Resource-Scarce Languages

Abstract: Today, parallel corpus-based systems dominate the transliteration landscape. But the resourcescarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modelin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 26 publications
(21 reference statements)
0
13
0
Order By: Relevance
“…In terms of transliteration unit, existing machine transliteration models can be classified into three categories, phoneme-based (Knight and Graehl, 1997;Lee and Choi, 1998;Wan and Verspoor, 1998;Jung et al, 2000;Meng et al, 2001;Oh and Choi, 2002;Virga and Khudanpur, 2003;Gao et al, 2005), grapheme-based Ekbal et al, 2006;Ganesh et al, 2008;Das et al, 2009;Chinnakotla et al, 2010;Finch and Sumita, 2010), and hybrid (Al-Onaizan and Knight, 2002;Bilac and Tanaka, 2004;Oh and Choi, 2005;Oh et al, 2006;Kim et al, 1999).…”
Section: Related Workmentioning
confidence: 99%
“…In terms of transliteration unit, existing machine transliteration models can be classified into three categories, phoneme-based (Knight and Graehl, 1997;Lee and Choi, 1998;Wan and Verspoor, 1998;Jung et al, 2000;Meng et al, 2001;Oh and Choi, 2002;Virga and Khudanpur, 2003;Gao et al, 2005), grapheme-based Ekbal et al, 2006;Ganesh et al, 2008;Das et al, 2009;Chinnakotla et al, 2010;Finch and Sumita, 2010), and hybrid (Al-Onaizan and Knight, 2002;Bilac and Tanaka, 2004;Oh and Choi, 2005;Oh et al, 2006;Kim et al, 1999).…”
Section: Related Workmentioning
confidence: 99%
“…Chinnakotla et al (2010) generate transliteration candidates using manually developed character mapping rules and rerank them with a character language model. The major limitations are: (i) character transliteration probability is not learnt, so there is undue reliance on the language model to handle ambiguity, and (ii) significant manual effort for good coverage of mapping rules.…”
Section: Related Workmentioning
confidence: 99%
“…Unsupervised transliteration can be defined as: Learn a transliteration model (T X ) from the source language (F) to the target (E) language given their respective monolingual word lists, W F and W E respectively. We explore this direction in the present work, addressing shortcomings in the previous work (Ravi and Knight, 2009;Chinnakotla et al, 2010).…”
Section: Introductionmentioning
confidence: 99%
“…Parallel corpora is used to construct bilingual dictionaries. Chinnakotla et.al [21] proposed that to achieve reasonable transliteration performance when compared to that of baseline statistical systems trained using parallel corpora, combine monolingual resources with manually created rule base. For many language pairs parallel corpora is not always available.…”
Section: Corpus Based Translationmentioning
confidence: 99%