2006
DOI: 10.1007/s10579-006-9007-3
|View full text |Cite
|
Sign up to set email alerts
|

Automatic induction of language model data for a spoken dialogue system

Abstract: When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 26 publications
(15 reference statements)
0
8
0
Order By: Relevance
“…For statistical language model adaptation, there are a lot of works (see [11] for a nice review). Recently there have been a few studies that are oriented toward dialogue systems, e.g., [12,13,14]. One of the recent trends is to exploit web data [15,13], esp.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…For statistical language model adaptation, there are a lot of works (see [11] for a nice review). Recently there have been a few studies that are oriented toward dialogue systems, e.g., [12,13,14]. One of the recent trends is to exploit web data [15,13], esp.…”
Section: Related Workmentioning
confidence: 99%
“…But the data from the web is less likely to be truly in-domain but near in-domain. Wang et al [14] transformed out-of-domain data and then went through filtering and re-sampling to obtain near in-domain data. We use no out-of-domain data and get data only through induction and then generation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There are some LMA work oriented to the dialogue systems [10], [11], [12]. In our experiments, a trigram model was trained on the large text collection (T).…”
Section: Language Model Adaptationmentioning
confidence: 99%
“…E.g., Wang et al took a two-step approach to generate in-domain data through out-of-domain data transformation [12]. [15] extract useful information from previous domains and World Wide Web without any in-domain data.…”
Section: Introductionmentioning
confidence: 99%