2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960706
|View full text |Cite
|
Sign up to set email alerts
|

Language model parameter estimation using user transcriptions

Abstract: In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsuperv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 12 publications
(15 reference statements)
0
4
0
Order By: Relevance
“…Language model adaption has been widely studied in literature. Hsu et al [1] explored the iterative use of the ASR hypotheses for unsupervised parameter estimation for n-gram language models. Similarly, [2,3,4] proposed unsupervised adaptation methods for presentation lecture speech recognition.…”
Section: Related Workmentioning
confidence: 99%
“…Language model adaption has been widely studied in literature. Hsu et al [1] explored the iterative use of the ASR hypotheses for unsupervised parameter estimation for n-gram language models. Similarly, [2,3,4] proposed unsupervised adaptation methods for presentation lecture speech recognition.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, the problem of frequently appearing errors in automatic transcription of lecture speech was eliminated in [12] by correction of colloquial expressions, deletion of fillers and insertion of periods using statistical post-processing techniques. Authors in [13] and [14] explore output recognition hypotheses and effectiveness of supervised and unsupervised adaptation with varying amounts of user-provided transcripts to tune the language model parameters on a lecture transcription task in English.…”
Section: Introductionmentioning
confidence: 99%
“…This paper focuses on language modeling to improve recognition performance because language modeling is another important, challenging issue in transcribing podcasts and other similar content such as YouTube video clips featuring spoken documents. In the literature, various works have been done in terms of language model (LM) adaptation for several LVCSR tasks like broadcast news [6][7], meetings [8][9], and lectures [10] [11]. These works basically took an approach that a main (or background) LM trained from large amounts of task-specific text data is adapted using an additional resource such as indomain (on-topic) text data [7][8], web-based text data [6] [11], or user-provided text data [9] [10].…”
Section: Introductionmentioning
confidence: 99%
“…In the literature, various works have been done in terms of language model (LM) adaptation for several LVCSR tasks like broadcast news [6][7], meetings [8][9], and lectures [10] [11]. These works basically took an approach that a main (or background) LM trained from large amounts of task-specific text data is adapted using an additional resource such as indomain (on-topic) text data [7][8], web-based text data [6] [11], or user-provided text data [9] [10]. In our target having the diversity of topics, however, such a large-scale task-specific corpus cannot be prepared in advance to train the background LM.…”
Section: Introductionmentioning
confidence: 99%