A Hybrid Approach to Adaptive Statistical Language Modeling

Rosenfeld, Roni

doi:10.21236/ada458711

Cited by 76 publications

(19 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the function words are well estimated due to their high frequency in the training data, we considered a content word unigram cache. This is similar to the approach suggested by Rosenfeld [3] where only rare words observed in a document are cached, rare being defined by the frequency of the word in the training corpus. We observed that defining "rare" as a content word alleviated the problem of deciding a threshold frequency below which a word is considered rare, as well as gave us small but consistent improvements in performance over the frequency-based rare-word cache.…”

Section: Dynamic N-gram Cache Adaptationmentioning

confidence: 86%

“…We observed that defining "rare" as a content word alleviated the problem of deciding a threshold frequency below which a word is considered rare, as well as gave us small but consistent improvements in performance over the frequency-based rare-word cache. We also worked with a conditional bigram/trigram cache [3], which is used in addition to the content word unigram cache.…”

Section: Dynamic N-gram Cache Adaptationmentioning

confidence: 99%

“…Although quite powerful given their simplicity, n-gram models are constrained in their inability to take advantage of dependencies longer than n. One approach to overcome this limitation is to use dynamic cache language models [1,2,3], which model tasklevel dependencies by increasing the likelihood of a word given that it has been observed previously. However, cache models do not account for dependencies within a sentence.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Iyer

Ostendorf

1999

IEEE Trans. Speech Audio Process.

108

View full text Add to dashboard Cite

In this paper, we investigate a new statistical language model which captures topic-related dependencies of words within and across sentences. First, we develop a sentence-level mixture language model that takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic cache adaptation techniques in the framework of the mixture model. Experiments with the static (or unadapted) mixture model on the 1994 WSJ task indicated a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general n-gram model. The static mixture model also improved recognition performance over an adapted n-gram model. Mixture adaptation techniques contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy.

show abstract

Section: Dynamic N-gram Cache Adaptationmentioning

confidence: 86%

Section: Dynamic N-gram Cache Adaptationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Iyer

Ostendorf

1999

IEEE Trans. Speech Audio Process.

108

View full text Add to dashboard Cite

show abstract

“…The maximum entropy classifier, also known as a logistic regression classifier, is called a discriminative approach as it is based on the model of the conditional distribution P ( c | s ) Maximum entropy is widely used for many natural language processing tasks like text segmentation [30], parts-of-speech tagging [31], language modelling [32], text classification [33] and Named Entity Recognition (NER) [9,10]. The principle behind the maximum entropy approach is to model all that is known and assume nothing about what is unknown [34].…”

Section: Methodsmentioning

confidence: 99%

NetiNeti: discovery of scientific names from text using machine learning methods

2012

View full text Add to dashboard Cite

BackgroundA scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.ResultsWe present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages.ConclusionsWe present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.

show abstract

“…Evidence from different knowledge sources can be combined in an attempt to optimize the selection of correct hypotheses; see e.g. Alshawi and Carter (1994); Rayner et al (1994); Rosenfeld (1994).…”

Section: Introductionmentioning

confidence: 99%

Improving language models by clustering training sentences

Carter¹

1994

Proceedings of the Fourth Conference on Applied Natural Language Processing -

View full text Add to dashboard Cite

Many of the kinds of language model used in speech understanding suffer from imperfect modeling of intrasentential contextual influences. I argue that this problem can be addressed by clustering the sentences in a training corpus automatically into subcorpora on the criterion of entropy reduction, and calculating separate language model parameters for each cluster. This kind of clustering offers a way to represent important contextual effects and can therefore significantly improve the performance of a model. It also offers a reasonably automatic means to gather evidence on whether a more complex, context-sensitive model using the same general kind of linguistic information is likely to reward the effort that would be required to develop it: if clustering improves the performance of a model, this proves the existence of further context dependencies, not exploited by the unclustered model. As evidence for these claims, I present results showing that clustering improves some models but not others for the ATIS domain. These results are consistent with other findings for such models, suggesting that the existence or otherwise of an improvement brought about by clustering is indeed a good pointer to whether it is worth developing further the unclustered model.

show abstract

A Hybrid Approach to Adaptive Statistical Language Modeling

Cited by 76 publications

References 4 publications

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

NetiNeti: discovery of scientific names from text using machine learning methods

Improving language models by clustering training sentences

Contact Info

Product

Resources

About