Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181
DOI: 10.1109/icassp.1998.675362
|View full text |Cite
|
Sign up to set email alerts
|

Just-in-time language modelling

Abstract: Traditional approaches to language modelling have relied on a fixed corpus of text to inform the parameters of a probability distribution over word sequences. Increasing the corpus size often leads to better-performing language models, but no matter how large, the corpus is a static entity, unable to reflect information about events which postdate it. In these pages we introduce an online paradigm which interleaves the estimation and application of a language model. We present a Bayesian approach to online lan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 41 publications
(28 citation statements)
references
References 6 publications
(2 reference statements)
0
28
0
Order By: Relevance
“…In this work, we showed that a Gaussian prior can be used to smooth maximum entropy n-gram models to achieve performance equal to or superior to that of all other techniques for 6 The optimal variances Na 2 m for the Gaussian prior found by the Powell search were mostly in the range 1.5 < No 2 m < 5, where N is the size of the training set. Multiplying by N converts the variances from probability space to count space, and discounts are relatively constant in count space over different training set sizes.…”
Section: Discussionmentioning
confidence: 92%
See 4 more Smart Citations
“…In this work, we showed that a Gaussian prior can be used to smooth maximum entropy n-gram models to achieve performance equal to or superior to that of all other techniques for 6 The optimal variances Na 2 m for the Gaussian prior found by the Powell search were mostly in the range 1.5 < No 2 m < 5, where N is the size of the training set. Multiplying by N converts the variances from probability space to count space, and discounts are relatively constant in count space over different training set sizes.…”
Section: Discussionmentioning
confidence: 92%
“…In the original implementation ME-gauss, a different a m is used for each level of the n-gram model. 6 We also considered using a single a over the whole model (ME-gauss-1), and using three parameters <7 TO ,i, o m^i and <7 mj 3+ for each level of the n-gram model, to be applied to m-grams with 1, 2, or 3 or more counts in the training data, respectively. This latter parameterization (ME-gauss-3n) is analogous to the parameterization of modified KneserNey smoothing.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations