Proceedings of the 19th International Conference on Computational Linguistics - 2002
DOI: 10.3115/1072228.1072392
|View full text |Cite
|
Sign up to set email alerts
|

Language model adaptation with additional text generated by machine translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2005
2005
2014
2014

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…The collection of textual data in a given language (and for a given domain) is also a hot topic that can be addressed using the Web as a corpus (Le et al, 2003;Cai, 2008) or using machine translation systems to port text corpora from one language to another (Nakajima et al, 2002;Jensson, 2008;Suenderman and Liscombe, 2009;Cucu et al, 2012). However, one faces specific problems, when developing language models for some underresourced languages.…”
Section: Web or Translation-based Text Data Collectionmentioning
confidence: 99%
“…The collection of textual data in a given language (and for a given domain) is also a hot topic that can be addressed using the Web as a corpus (Le et al, 2003;Cai, 2008) or using machine translation systems to port text corpora from one language to another (Nakajima et al, 2002;Jensson, 2008;Suenderman and Liscombe, 2009;Cucu et al, 2012). However, one faces specific problems, when developing language models for some underresourced languages.…”
Section: Web or Translation-based Text Data Collectionmentioning
confidence: 99%
“…Unsupervised language model domain adaptation using SMT (English to Japanese) text was proposed back in 2002 by Nakajima [12]. This paper only reports language model perplexity results, without investigating the implications on a full ASR system.…”
Section: Related Work On Smt-based Domain Adaptation For Asrmentioning
confidence: 99%
“…This issue was recently dealt with for some under-resourced languages such as Thai [7], Amharic [8] and Vietnamese [3]. This is not only true for under-resourced languages, but the collection of textual data in a given language (and for a given domain) is also a hot topic that can be addressed using the Web as a corpus [9,10,11] or using machine translation systems to port text corpora from one language to another [12,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…Beyond this, we bootstrap the syntax-based language model using the additional data generated by a syntax-based MT system. To our knowledge, the only previous work addressing this issue is Nakajima et al [2002]. They adapted an n-gram language model with the data generated by a word-based MT system.…”
Section: Related Workmentioning
confidence: 99%