1999
DOI: 10.1109/89.736328
|View full text |Cite
|
Sign up to set email alerts
|

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Abstract: In this paper, we investigate a new statistical language model which captures topic-related dependencies of words within and across sentences. First, we develop a sentence-level mixture language model that takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic cache adaptation techniques in the framework of the mixture model. Experiments with the static (or unadapted) mixture model on the 1994 WSJ task indicated a 21% reduction in perplexity and a 3-4% i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
60
0

Year Published

1999
1999
2013
2013

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 108 publications
(65 citation statements)
references
References 24 publications
2
60
0
Order By: Relevance
“…The topics can be known beforehand or they could be obtained in an unsupervised way by clustering the different words or sequences. The cluster criterion is usually the optimization of an appropriate distance between clusters (Bellegarda, 2000;Chen et al, 2001;Iyer & Ostendorf, 1999). A different context-dependent analysis arises when using the most recent information provided by the user of the system (that is, the recognition hypotheses of the previous interactions).…”
Section: Applications Oflm Adaptationmentioning
confidence: 99%
See 2 more Smart Citations
“…The topics can be known beforehand or they could be obtained in an unsupervised way by clustering the different words or sequences. The cluster criterion is usually the optimization of an appropriate distance between clusters (Bellegarda, 2000;Chen et al, 2001;Iyer & Ostendorf, 1999). A different context-dependent analysis arises when using the most recent information provided by the user of the system (that is, the recognition hypotheses of the previous interactions).…”
Section: Applications Oflm Adaptationmentioning
confidence: 99%
“…), it has been proposed to build even the contentspecific LMs using the information gathered up to the current interaction, in the terms of all the previous sequence of words. This approach, referred to as dynamic cache modelling (Iyer & Ostendorf, 1999;Jelinek, Merialdo, Roukos, & Strauss, 1991;Kuhn & de Mori, 1990), relies on the fact that, within a specific domain, if a certain word or word sequence has appeared, it is more likely to appear again in a short term. Instead of estimating a LM for the whole content of the cache, it has been proven (Lobacheva, 2000;Rosenfeld, 1994) that using only the content words related to the current topic yields better results, since function words (such as prepositions, articles, and so on) are expected to be common across all the topics.…”
Section: Model Interpolationmentioning
confidence: 99%
See 1 more Smart Citation
“…Topic-dependent modeling has proven to be an effective way to improve the quality of models in speech recognition (Iyer and Osendorf, [1]; Carter, [2]). Recently, experiments in the field of machine translation (Hasan and Ney, [3]; Yamamoto and Sumita, [4]; Finch et al [5], Foster and Kuhn, [6]) have shown that class-specific models are also useful for translation.…”
Section: Introductionmentioning
confidence: 99%
“…The method presented by Mikolov et al (2011) is based on a combination (in the form of linear interpolation) of advanced language modeling techniques such as the class-based model, the cache model, the maximum entropy model, structured LM and others. The results of Iyer and Ostendorf (1999) suggest modelling long distance dependence using topic mixtures model.…”
mentioning
confidence: 99%