1997 IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.1997.596049
|View full text |Cite
|
Sign up to set email alerts
|

Language model adaptation using mixtures and an exponentially decaying cache

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
79
0
1

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 110 publications
(88 citation statements)
references
References 5 publications
2
79
0
1
Order By: Relevance
“…The idea of making language models adaptive by introducing a decay function has appeared in various contexts such as speech recognition [22], news retrieval [23], email clustering [24], and collaborative filtering [25]. However, to the best of our knowledge, the effective behaviour and efficient implementation of timesensitive language modelling for the problem of online term recurrence prediction have not been studied before.…”
Section: Related Workmentioning
confidence: 99%
“…The idea of making language models adaptive by introducing a decay function has appeared in various contexts such as speech recognition [22], news retrieval [23], email clustering [24], and collaborative filtering [25]. However, to the best of our knowledge, the effective behaviour and efficient implementation of timesensitive language modelling for the problem of online term recurrence prediction have not been studied before.…”
Section: Related Workmentioning
confidence: 99%
“…Individual data sources will be more appropriate depending on the task, for example, broadcast news or conversational telephone speech. To reduce the mismatch between the interpolated model and the target domain of interest, interpolation weights may be tuned by minimizing the perplexity on some held-out data similar to the target domain (Jelinek and Mercer, 1980;Kneser and Steinbiss, 1993;Iyer et al, 1994;Bahl et al, 1995;Rosenfeld, 1996Rosenfeld, , 2000Jelinek, 1997;Clarkson and Robinson, 1997;Kneser and Peters, 1997;Seymore and Rosenfeld, 1997;Iyer and Ostendorf, 1999). These weights indicate the "usefulness" of each source for a particular task.…”
Section: Introductionmentioning
confidence: 99%
“…In this approach, models trained on out-of-domain data can be interpolated with models trained on the last Q sentences that have been processed [10,30,26] and the performance in SMT has been carefully assessed in [47].…”
Section: Adaptationmentioning
confidence: 99%