Style &amp; topic language model adaptation using HMM-LDA

Hsu, Bo-June; Glass, James

doi:10.3115/1610075.1610128

Cited by 45 publications

(22 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tam and Schultz [8] successfully applied the LDA model to unsupervised LM adaptation by interpolating the background LM with the dynamic unigram LM estimated by the LDA model. Hsu and Glass [9] investigated using hidden Markov model with LDA to allow for both topic and style adaptation. Mrva and Woodland [10] achieved a WER reduction on broadcast conversation recognition using an LDA based adaptation approach that effectively combined the LMs trained from corpora with different styles: broadcast news and broadcast conversation data.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised language model adaptation via topic modeling based on named entity hypotheses

Liu

2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Language model (LM) adaptation is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation, in this paper we propose to leverage named entity (NE) information for topic analysis and LM adaptation. We investigate two topic modeling approaches, latent Dirichlet allocation (LDA) and clustering, and proposed a new mixture topic model for LDA based LM adaptation. Our experiments for N-best list rescoring have shown that this new adaptation framework using NE information and topic analysis outperforms the baseline generic N-gram LM based on a state-of-the-art Mandarin recognition system.

show abstract

Section: Related Workmentioning

confidence: 99%

Unsupervised language model adaptation via topic modeling based on named entity hypotheses

Liu

2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…One approach is to label each word in each document with its most dominant topic label as determined by a bag-of-words topic model. From this labeling of the document collection, common multi-word sequences that share the same topic label can be extracted and used for the summarization of the topics in a document or document collection [4,5]. Though effective, this approach still retains the bag-of-words assumption during training, and it is reasonable to assume that topic modeling improvements could be attained if the models directly incorporated knowledge of informative multi-word sequences or phrases.…”

Section: Introductionmentioning

confidence: 99%

Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech

Hazen

Richardson

2012

2012 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bagof-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fashion from a document collection using weighted pointwise mutual information statistics to guide the process. In experiments on the Fisher Corpus of conversational speech, the incorporation of learned phrases into a latent topic model yielded significant improvements in the unsupervised discovery of the known topics present within the data.

show abstract

“…These techniques can also be applied to help automatic speech recognition by adapting the Language Model (LM). For the task of speech recognition in academic lectures, Hsu and Glass [4] used a Hidden Markov Model with LDA (HMM-LDA) [5] which can model content words as well as syntactic words. In our previous work, we developed the Topic Tracking Language Model (TTLM) to explicitly capture the time evolution of topics throughout a recording session [6].…”

Section: Introductionmentioning

confidence: 99%

Handling uncertain observations in unsupervised topic-mixture language model adaptation

Chuangsuwanich¹,

Watanabe

Hori

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We propose an extension to the recent approaches in topic-mixture modeling such as Latent Dirichlet Allocation and Topic Tracking Model for the purpose of unsupervised adaptation in speech recognition. Instead of using the 1-best input given by the speech recognizer, the proposed model takes confusion network as an input to alleviate recognition errors. We incorporate a selection variable which helps reweight the recognition output, thus creating a more accurate latent topic estimate. Compared to adapting based on just one recognition hypothesis, the proposed model show WER improvements on two different tasks.

show abstract

Style & topic language model adaptation using HMM-LDA

Cited by 45 publications

References 23 publications

Unsupervised language model adaptation via topic modeling based on named entity hypotheses

Unsupervised language model adaptation via topic modeling based on named entity hypotheses

Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech

Handling uncertain observations in unsupervised topic-mixture language model adaptation

Contact Info

Product

Resources

About