Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP '06 2006
DOI: 10.3115/1610075.1610128
|View full text |Cite
|
Sign up to set email alerts
|

Style & topic language model adaptation using HMM-LDA

Abstract: Adapting language models across styles and topics, such as for lecture transcription, involves combining generic style models with topic-specific content relevant to the target document. In this work, we investigate the use of the Hidden Markov Model with Latent Dirichlet Allocation (HMM-LDA) to obtain syntactic state and semantic topic assignments to word instances in the training corpus. From these context-dependent labels, we construct style and topic models that better model the target document, and extend… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
22
0

Year Published

2007
2007
2018
2018

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(22 citation statements)
references
References 23 publications
0
22
0
Order By: Relevance
“…Tam and Schultz [8] successfully applied the LDA model to unsupervised LM adaptation by interpolating the background LM with the dynamic unigram LM estimated by the LDA model. Hsu and Glass [9] investigated using hidden Markov model with LDA to allow for both topic and style adaptation. Mrva and Woodland [10] achieved a WER reduction on broadcast conversation recognition using an LDA based adaptation approach that effectively combined the LMs trained from corpora with different styles: broadcast news and broadcast conversation data.…”
Section: Related Workmentioning
confidence: 99%
“…Tam and Schultz [8] successfully applied the LDA model to unsupervised LM adaptation by interpolating the background LM with the dynamic unigram LM estimated by the LDA model. Hsu and Glass [9] investigated using hidden Markov model with LDA to allow for both topic and style adaptation. Mrva and Woodland [10] achieved a WER reduction on broadcast conversation recognition using an LDA based adaptation approach that effectively combined the LMs trained from corpora with different styles: broadcast news and broadcast conversation data.…”
Section: Related Workmentioning
confidence: 99%
“…One approach is to label each word in each document with its most dominant topic label as determined by a bag-of-words topic model. From this labeling of the document collection, common multi-word sequences that share the same topic label can be extracted and used for the summarization of the topics in a document or document collection [4,5]. Though effective, this approach still retains the bag-of-words assumption during training, and it is reasonable to assume that topic modeling improvements could be attained if the models directly incorporated knowledge of informative multi-word sequences or phrases.…”
Section: Introductionmentioning
confidence: 99%
“…These techniques can also be applied to help automatic speech recognition by adapting the Language Model (LM). For the task of speech recognition in academic lectures, Hsu and Glass [4] used a Hidden Markov Model with LDA (HMM-LDA) [5] which can model content words as well as syntactic words. In our previous work, we developed the Topic Tracking Language Model (TTLM) to explicitly capture the time evolution of topics throughout a recording session [6].…”
Section: Introductionmentioning
confidence: 99%