Automatic induction of language model data for a spoken dialogue system

Wang, Chao; Chung, Grace; Seneff, Stephanie

doi:10.1007/s10579-006-9007-3

Cited by 6 publications

(8 citation statements)

References 26 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For statistical language model adaptation, there are a lot of works (see [11] for a nice review). Recently there have been a few studies that are oriented toward dialogue systems, e.g., [12,13,14]. One of the recent trends is to exploit web data [15,13], esp.…”

Section: Related Workmentioning

confidence: 99%

“…But the data from the web is less likely to be truly in-domain but near in-domain. Wang et al [14] transformed out-of-domain data and then went through filtering and re-sampling to obtain near in-domain data. We use no out-of-domain data and get data only through induction and then generation.…”

Section: Related Workmentioning

confidence: 99%

“…There are some language model adaptation (LMA) work oriented to the dialogue systems e.g. Wang et al(2006), Hakkani-Tür et al(2006), Bellegarda(2004. So far major effort has been spent on adaptation for large vocabulary speech recognition or transcription tasks.…”

Section: Language Model Adaptationmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Class Induction for Language Model Adaptation in a Chinese Voice Search System

Bao

et al. 2010

2010 International Conference on Electrical and Control Engineering

View full text Add to dashboard Cite

In this paper we describe our work on generating in-domain corpus using auto-induced semantic classes and structures for language model adaptation in a voice search dialogue system. We proposed a novel similarity measure based on co-occurrence probabilities for inducing semantic classes. Clustering with the new similarity measure outperformed that with the widely used distance measure based on Kullback-Leibler divergence. For language model adaptation, we adopted the widely used approach of model interpolation. Experiments show that both human-human and generated data helped a lot and the latter helped more. This means that the generated data is more in-domain than the human-human data for human-computer dialogues. The performance of 9.0% in character recognition error rate and 25.5 in perplexity on the test data is achieved with a language model from an interpolated language model.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Semantic Class Induction for Language Model Adaptation in a Chinese Voice Search System

Bao

et al. 2010

2010 International Conference on Electrical and Control Engineering

View full text Add to dashboard Cite

show abstract

“…There are some LMA work oriented to the dialogue systems [10], [11], [12]. In our experiments, a trigram model was trained on the large text collection (T).…”

Section: Language Model Adaptationmentioning

confidence: 99%

“…E.g., Wang et al took a two-step approach to generate in-domain data through out-of-domain data transformation [12]. [15] extract useful information from previous domains and World Wide Web without any in-domain data.…”

Section: Introductionmentioning

confidence: 99%

Language model adaptation using auto-induced semantic structures in a voice search system

Yan

2009

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems

View full text Add to dashboard Cite

In this paper, we study how to generate in-domain data for statistical language model adaptation in a Chinese voice search dialogue system. Given limited amount of in-domain data, we use unsupervised clustering to induce semantic classes and structures from the first part of test data. These structures are further augmented with domain information to generate large amount of in-domain data. Lastly we test on the second part of test data and get a improvement of speech recognition for 6.2%.

show abstract

Spoken Commands in a Smart Home: An Iterative Approach to the Sphinx Algorithm

Denkowski

Hannon

Sánchez

MICAI 2007: Advances in Artificial Intelligence

View full text Add to dashboard Cite

Automatic induction of language model data for a spoken dialogue system

Cited by 6 publications

References 26 publications

Semantic Class Induction for Language Model Adaptation in a Chinese Voice Search System

Semantic Class Induction for Language Model Adaptation in a Chinese Voice Search System

Language model adaptation using auto-induced semantic structures in a voice search system

Spoken Commands in a Smart Home: An Iterative Approach to the Sphinx Algorithm

Contact Info

Product

Resources

About