1994
DOI: 10.21236/ada458711
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Approach to Adaptive Statistical Language Modeling

Abstract: We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trigram. We describe the knowledge sources used to build such a model with ARPA's official WSJ corpus, and report on perplexity and word error rate results obtained with it. Then, three different adaptation paradigms ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

1994
1994
2012
2012

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(19 citation statements)
references
References 4 publications
0
19
0
Order By: Relevance
“…Since the function words are well estimated due to their high frequency in the training data, we considered a content word unigram cache. This is similar to the approach suggested by Rosenfeld [3] where only rare words observed in a document are cached, rare being defined by the frequency of the word in the training corpus. We observed that defining "rare" as a content word alleviated the problem of deciding a threshold frequency below which a word is considered rare, as well as gave us small but consistent improvements in performance over the frequency-based rare-word cache.…”
Section: Dynamic N-gram Cache Adaptationmentioning
confidence: 86%
See 2 more Smart Citations
“…Since the function words are well estimated due to their high frequency in the training data, we considered a content word unigram cache. This is similar to the approach suggested by Rosenfeld [3] where only rare words observed in a document are cached, rare being defined by the frequency of the word in the training corpus. We observed that defining "rare" as a content word alleviated the problem of deciding a threshold frequency below which a word is considered rare, as well as gave us small but consistent improvements in performance over the frequency-based rare-word cache.…”
Section: Dynamic N-gram Cache Adaptationmentioning
confidence: 86%
“…We observed that defining "rare" as a content word alleviated the problem of deciding a threshold frequency below which a word is considered rare, as well as gave us small but consistent improvements in performance over the frequency-based rare-word cache. We also worked with a conditional bigram/trigram cache [3], which is used in addition to the content word unigram cache.…”
Section: Dynamic N-gram Cache Adaptationmentioning
confidence: 99%
See 1 more Smart Citation
“…The maximum entropy classifier, also known as a logistic regression classifier, is called a discriminative approach as it is based on the model of the conditional distribution P ( c | s ) Maximum entropy is widely used for many natural language processing tasks like text segmentation [30], parts-of-speech tagging [31], language modelling [32], text classification [33] and Named Entity Recognition (NER) [9,10]. The principle behind the maximum entropy approach is to model all that is known and assume nothing about what is unknown [34].…”
Section: Methodsmentioning
confidence: 99%
“…Evidence from different knowledge sources can be combined in an attempt to optimize the selection of correct hypotheses; see e.g. Alshawi and Carter (1994); Rayner et al (1994); Rosenfeld (1994).…”
Section: Introductionmentioning
confidence: 99%