Automatic induction of n-gram language models from a natural language grammar

Seneff, Stephanie; Wang, Chao; Hazen, Timothy J.

doi:10.21437/eurospeech.2003-266

Cited by 12 publications

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The acoustic models for the speech recognition system, SUMMIT, are trained with a combination of two data corpora, "Yinhe" [8] and "MAT2000" [9], both of which contain Mandarin Chinese speech data from native speakers. The class n-gram language model is trained [10,11] by parsing a corpus using TINA [12]. Since we do not yet have a training corpus from users, we make use of the English corpus from CityBrowser I.…”

Section: Speech Recognition and Synthesismentioning

confidence: 99%

Citybrowser II: A Multimodal Restaurant Guide in Mandarin

Liu,

Xu,

Seneff

et al. 2008

Int. Symp. On Chinese Spoken Language Processing

View full text Add to dashboard Cite

In this paper we present a conversational dialogue system, CityBrowser II, which allows users to inquire about information about restaurants in Mandarin. Developed in the Galaxy infrastructure with a common, language-independent semantic representation, CityBrowser integrates portability and scalability. By inheriting the infrastructure and main language understanding/generation components from its English predecessor, CityBrowser can easily be transformed to a Mandarin language environment. This paper describes our system implementation, focusing on the languagespecific modifications to the original English system. We show that our language-independent yet scalable system infrastructure makes multilingualism a promising task.

show abstract

Section: Speech Recognition and Synthesismentioning

confidence: 99%

Citybrowser II: A Multimodal Restaurant Guide in Mandarin

Liu,

Xu,

Seneff

et al. 2008

Int. Symp. On Chinese Spoken Language Processing

View full text Add to dashboard Cite

show abstract

A Framework for Developing Conversational User Interfaces

Glass¹,

Weinstein²,

Cyphers³

et al.

Computer-Aided Design of User Interfaces IV

View full text Add to dashboard Cite

Automatic induction of language model data for a spoken dialogue system

Wang¹,

Chung

Seneff³

2006

Lang Resources & Evaluation

View full text Add to dashboard Cite

When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. The second stage involves sampling the synthetic corpus towards the goal of obtaining data that would be representative of the statistics of applicationspecific real user interactions. The sampling methods proposed employ an example-based generation framework, a simulated user model and information extracted from development data. Evaluation is conducted on recognition performance in a restaurant information domain. We show that word error rate can be reduced when limited amounts of real user training data are augmented with synthetic data derived by our methods.

show abstract

Automatic induction of n-gram language models from a natural language grammar

Cited by 12 publications

References 11 publications

Citybrowser II: A Multimodal Restaurant Guide in Mandarin

Citybrowser II: A Multimodal Restaurant Guide in Mandarin

A Framework for Developing Conversational User Interfaces

Automatic induction of language model data for a spoken dialogue system

Contact Info

Product

Resources

About