A cache-based natural language model for speech recognition

Kühn, Roland; Mori, Renato De

doi:10.1109/34.56193

Cited by 370 publications

(233 citation statements)

References 8 publications

Supporting

Mentioning

224

Contrasting

Unclassified

Order By: Relevance

“…However, in this paper, we show that structure reuse is one possible way in which the independence assumption is broken. A simple and principled approach to handling structure re-use is to use adaptation probabilities for probabilistic grammar rules, analogous to cache probabilities used in caching language models (Rosenfeld, Wasserman, Cai, & Zhu, 1999;Kuhn & Mori, 1990), which is what we proposed in this paper.…”

Section: Discussionmentioning

confidence: 99%

“…The importance of comprehension priming at the lexical level has also been noted by the speech recognition community (Kuhn & Mori, 1990), who use socalled caching language models to improve the performance of speech comprehension software. The concept of caching language models is quite simple: a cache of recently seen words is maintained, and the probability of words in the cache is higher than those outside the cache.…”

Section: Adaptationmentioning

confidence: 90%

See 1 more Smart Citation

A probabilistic corpus-based model of syntactic parallelism

Dubey

Keller²,

Sturt

2008

Cognition

View full text Add to dashboard Cite

Work in experimental psycholinguistics has shown that the processing of coordinate structures is facilitated when the two conjuncts share the same syntactic structure (Frazier, Munn, & Clifton, 2000). In the present paper, we argue that this parallelism effect is a specific case of the more general phenomenon of syntactic priming-the tendency to repeat recently used syntactic structures. We show that there is a significant tendency for structural repetition in corpora, and that this tendency is not limited to syntactic environments involving coordination, though it is greater in these environments. We present two different implementations of a syntactic priming mechanism in a probabilistic parsing model and test their predictions against experimental data on NP parallelism in English. Based on these results, we argue that a general purpose priming mechanism is preferred over a special mechanism limited to coordination. Finally, we show how notions of activation and decay from ACT-R can be incorporated in the model, enabling it to account for a set of experimental data on sentential parallelism in German.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Adaptationmentioning

confidence: 90%

A probabilistic corpus-based model of syntactic parallelism

Dubey

Keller²,

Sturt

2008

Cognition

View full text Add to dashboard Cite

show abstract

“…One means of injecting long-range awareness into a language model is by retaining a cache of the most recently seen n-grams which is combined (typically by linear interpolation) with the static model (Jelinek et al, 1991;Kuhn & de Mori, 1990). Another approach, using maximum entropy methods, introduces parameters for trigger pairs of mutually informative words, so that occurrences of certain words in recent context boost the probabilities of the words that they trigger (Lau, Rosenfeld, & Roukos, 1993).…”

Section: Some Doctors Are More Skilled At Doing the Procedures Than Otmentioning

confidence: 99%

Untitled

1999

View full text Add to dashboard Cite

Abstract. This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which may be domain-specific, that tend to be used near segment boundaries. Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.

show abstract

“…In this approach, models trained on out-of-domain data can be interpolated with models trained on the last Q sentences that have been processed [10,30,26] and the performance in SMT has been carefully assessed in [47].…”

Section: Adaptationmentioning

confidence: 99%