2000
DOI: 10.1109/5.880083
|View full text |Cite
|
Sign up to set email alerts
|

Two decades of statistical language modeling: where do we go from here?

Abstract: Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
303
0
6

Year Published

2004
2004
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 514 publications
(328 citation statements)
references
References 58 publications
3
303
0
6
Order By: Relevance
“…The second module is Chunked-Off Markov Model [3] training the database with corpus sentences in which all the nouns and named entities are replaced with their respective type. This is implemented using the tagging and chunking operations of NLTK.…”
Section: Chunked-off Markov Modelmentioning
confidence: 99%
“…The second module is Chunked-Off Markov Model [3] training the database with corpus sentences in which all the nouns and named entities are replaced with their respective type. This is implemented using the tagging and chunking operations of NLTK.…”
Section: Chunked-off Markov Modelmentioning
confidence: 99%
“…The basic idea is that we assume there are k latent common themes in all collections. Each is characterized by a multinomial word distribution (also called a unigram language model [10]). We then assume that a document is a sample of a mixture model with these theme models as components.…”
Section: The General Problemmentioning
confidence: 99%
“…Language models (LMs) are essential for automatic speech recognition or statistical machine translation (Rosenfeld, 2000). The performance of LMs strongly depends on quality and quantity of their training data.…”
Section: Introductionmentioning
confidence: 99%