Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Iyer, Rishabh; Ostendorf, Mari

doi:10.1109/89.736328

Cited by 108 publications

(65 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The topics can be known beforehand or they could be obtained in an unsupervised way by clustering the different words or sequences. The cluster criterion is usually the optimization of an appropriate distance between clusters (Bellegarda, 2000;Chen et al, 2001;Iyer & Ostendorf, 1999). A different context-dependent analysis arises when using the most recent information provided by the user of the system (that is, the recognition hypotheses of the previous interactions).…”

Section: Applications Oflm Adaptationmentioning

confidence: 99%

“…), it has been proposed to build even the contentspecific LMs using the information gathered up to the current interaction, in the terms of all the previous sequence of words. This approach, referred to as dynamic cache modelling (Iyer & Ostendorf, 1999;Jelinek, Merialdo, Roukos, & Strauss, 1991;Kuhn & de Mori, 1990), relies on the fact that, within a specific domain, if a certain word or word sequence has appeared, it is more likely to appear again in a short term. Instead of estimating a LM for the whole content of the cache, it has been proven (Lobacheva, 2000;Rosenfeld, 1994) that using only the content words related to the current topic yields better results, since function words (such as prepositions, articles, and so on) are expected to be common across all the topics.…”

Section: Model Interpolationmentioning

confidence: 99%

“…Indeed, the sparseness of the available data is so high that the LMs tend to be poorly estimated. In an effort to solve this problem, several clustering algorithms have been proposed (Chen, Gauvain, Lamel, Adda, & Adda, 2001;Iyer & Ostendorf, 1999;Iyer, Ostendorf, & Rohlicek, 1994) to group together those elements that share some properties. An interesting idea to fuse the clustering approach with the exploitation of higher level information (not only the information directly provided by the speech recognizer, such as acoustic scores, or the most likely word sequence) is the application of an analysis to extract the semantic relationships between terms, or documents.…”

Section: Previous Workmentioning

confidence: 99%

See 2 more Smart Citations

On the dynamic adaptation of language models based on dialogue information

Lucas-Cuesta

Ferreiros

Fernández-Martínez

et al. 2013

Expert Systems with Applications

View full text Add to dashboard Cite

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks.

show abstract

Section: Applications Oflm Adaptationmentioning

confidence: 99%

Section: Model Interpolationmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

On the dynamic adaptation of language models based on dialogue information

Lucas-Cuesta

Ferreiros

Fernández-Martínez

et al. 2013

Expert Systems with Applications

View full text Add to dashboard Cite

show abstract

“…Topic-dependent modeling has proven to be an effective way to improve the quality of models in speech recognition (Iyer and Osendorf, [1]; Carter, [2]). Recently, experiments in the field of machine translation (Hasan and Ney, [3]; Yamamoto and Sumita, [4]; Finch et al [5], Foster and Kuhn, [6]) have shown that class-specific models are also useful for translation.…”

Section: Introductionmentioning

confidence: 99%

Class-Dependent Modeling for Dialog Translation

Finch¹,

Sumita²,

Nakamura³

2009

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper presents a technique for class-dependent decoding for statistical machine translation (SMT). The approach differs from previous methods of class-dependent translation in that the class-dependent forms of all models are integrated directly into the decoding process. We employ probabilistic mixture weights between models that can change dynamically on a sentence-by-sentence basis depending on the characteristics of the source sentence. The effectiveness of this approach is demonstrated by evaluating its performance on travel conversation data. We used this approach to tackle the translation of questions and declarative sentences using class-dependent models. To achieve this, our system integrated two sets of models specifically built to deal with sentences that fall into one of two classes of dialog sentence: questions and declarations, with a third set of models built with all of the data to handle the general case. The technique was thoroughly evaluated on data from 16 language pairs using 6 machine translation evaluation metrics. We found the results were corpusdependent, but in most cases our system was able to improve translation performance, and for some languages the improvements were substantial.

show abstract

“…The method presented by Mikolov et al (2011) is based on a combination (in the form of linear interpolation) of advanced language modeling techniques such as the class-based model, the cache model, the maximum entropy model, structured LM and others. The results of Iyer and Ostendorf (1999) suggest modelling long distance dependence using topic mixtures model.…”

mentioning

confidence: 99%

Pipelined language model construction for Polish speech recognition

Sas¹,

Żołnierek²

2013

International Journal of Applied Mathematics and Computer Science

View full text Add to dashboard Cite

The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

show abstract

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Cited by 108 publications

References 24 publications

On the dynamic adaptation of language models based on dialogue information

On the dynamic adaptation of language models based on dialogue information

Class-Dependent Modeling for Dialog Translation

Pipelined language model construction for Polish speech recognition

Contact Info

Product

Resources

About