Language model adaptation using mixtures and an exponentially decaying cache

Clarkson, Philip; Robinson, Anthony J.

doi:10.1109/icassp.1997.596049

Cited by 110 publications

(88 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The idea of making language models adaptive by introducing a decay function has appeared in various contexts such as speech recognition [22], news retrieval [23], email clustering [24], and collaborative filtering [25]. However, to the best of our knowledge, the effective behaviour and efficient implementation of timesensitive language modelling for the problem of online term recurrence prediction have not been studied before.…”

Section: Related Workmentioning

confidence: 99%

Time-Sensitive Language Modelling for Online Term Recurrence Prediction

Zhang

Mao

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract.We address the problem of online term recurrence prediction: for a stream of terms, at each time point predict what term is going to recur next in the stream given the term occurrence history so far. It has many applications, for example, in Web search and social tagging. In this paper, we propose a time-sensitive language modelling approach to this problem that effectively combines term frequency and term recency information, and describe how this approach can be implemented efficiently by an online learning algorithm. Our experiments on a real-world Web query log dataset show significant improvements over standard language modelling.

show abstract

Section: Related Workmentioning

confidence: 99%

Time-Sensitive Language Modelling for Online Term Recurrence Prediction

Zhang

Mao

et al. 2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Individual data sources will be more appropriate depending on the task, for example, broadcast news or conversational telephone speech. To reduce the mismatch between the interpolated model and the target domain of interest, interpolation weights may be tuned by minimizing the perplexity on some held-out data similar to the target domain (Jelinek and Mercer, 1980;Kneser and Steinbiss, 1993;Iyer et al, 1994;Bahl et al, 1995;Rosenfeld, 1996Rosenfeld, , 2000Jelinek, 1997;Clarkson and Robinson, 1997;Kneser and Peters, 1997;Seymore and Rosenfeld, 1997;Iyer and Ostendorf, 1999). These weights indicate the "usefulness" of each source for a particular task.…”

Section: Introductionmentioning

confidence: 99%

Use of contexts in language model interpolation and adaptation

Liu

Gales

Woodland

2013

Computer Speech & Language

View full text Add to dashboard Cite

“…In this approach, models trained on out-of-domain data can be interpolated with models trained on the last Q sentences that have been processed [10,30,26] and the performance in SMT has been carefully assessed in [47].…”

Section: Adaptationmentioning

confidence: 99%