Evaluation of a language model using a clustered model backoff

Miller, Joan G.; Alleva, Fil

doi:10.1109/icslp.1996.607136

Cited by 7 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Related research, carried out independently and concurrently, has recently been reported in Miller and Alleva (1996). It describes a method that allows backoffs to occur from a word-to a category-based bigram language model.…”

Section: Backing-off From Word-to Category-based N-gramsmentioning

confidence: 99%

Variable-length categoryn-gram language models

Niesler

Woodland

1999

Computer Speech & Language

View full text Add to dashboard Cite

Section: Backing-off From Word-to Category-based N-gramsmentioning

confidence: 99%

Variable-length categoryn-gram language models

Niesler

Woodland

1999

Computer Speech & Language

View full text Add to dashboard Cite

“…A better approach is to build a language model general enough to better estimate unseen and low frequency events, but specific enough to capture the ambiguous nature of words. A lot of work has been done on this, and the widely used techniques are interpolating the class-based LMs and word-based LMs [4,5,6], and backing-off from word-based LMs to class-based LMs when estimating probabilities of unseen events [7,8]. Performance of the LMs depends on the number of clusters to certain extent.…”

Section: Introductionmentioning

confidence: 99%

A novel interpolated N-gram language model based on class hierarchy

Liu

Yang

2009

2009 International Conference on Natural Language Processing and Knowledge Engineering

View full text Add to dashboard Cite

In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes' and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter's corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.

show abstract

Topic detection and tracking on heterogeneous information

et al. 2017

View full text Add to dashboard Cite

Given the proliferation of social media and the abundance of news feeds, a substantial amount of real-time content is distributed through disparate sources, which makes it increasingly difficult to glean and distill useful information. Although combining heterogeneous sources for topic detection has gained attention from several research communities, most of them fail to consider the interaction among different sources and their intertwined temporal dynamics. To address this concern, we studied the dynamics of topics from heterogeneous sources by exploiting both their individual properties (including temporal features) and their inter-relationships. We first implemented a heterogeneous topic model that enables topic-topic correspondence between the sources by iteratively updating its topic-word distribution. To capture temporal dynamics, the topics are then correlated with a time-dependent function that can characterise its social response and popularity over time. We extensively evaluate the proposed approach and compare to the state-of-the-art techniques on heterogeneous collection. Experimental results demonstrate that our approach can significantly outperform the existing ones.

show abstract

Evaluation of a language model using a clustered model backoff

Cited by 7 publications

References 12 publications

Variable-length categoryn-gram language models

Variable-length categoryn-gram language models

A novel interpolated N-gram language model based on class hierarchy

Topic detection and tracking on heterogeneous information

Contact Info

Product

Resources

About