Language model parameter estimation using user transcriptions

Hsu, Bo-June; Glass, James

doi:10.1109/icassp.2009.4960706

Cited by 5 publications

(4 citation statements)

References 12 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Language model adaption has been widely studied in literature. Hsu et al [1] explored the iterative use of the ASR hypotheses for unsupervised parameter estimation for n-gram language models. Similarly, [2,3,4] proposed unsupervised adaptation methods for presentation lecture speech recognition.…”

Section: Related Workmentioning

confidence: 99%

Online Incremental Learning for Speaker-Adaptive Language Models

Liu

Shen

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

Voice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on devices, targeted for individual users. In this work, we explore improving ASR systems over time through a user's own interactions. Our online learning approach for speaker-adaptive language modeling leverages a user's most recent utterances to enhance the speaker dependent features and traits. We experiment with the Large-Vocabulary Continuous Speech Recognition corpus Tedlium v2, and demonstrate an average reduction in perplexity (PPL) of 19.18% and average relative reduction in word error rate (WER) of 2.80% compared to a state-of-the-art baseline on Tedlium v2.

show abstract

Section: Related Workmentioning

confidence: 99%

Online Incremental Learning for Speaker-Adaptive Language Models

Liu

Shen

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

show abstract

“…Moreover, the problem of frequently appearing errors in automatic transcription of lecture speech was eliminated in [12] by correction of colloquial expressions, deletion of fillers and insertion of periods using statistical post-processing techniques. Authors in [13] and [14] explore output recognition hypotheses and effectiveness of supervised and unsupervised adaptation with varying amounts of user-provided transcripts to tune the language model parameters on a lecture transcription task in English.…”

Section: Introductionmentioning

confidence: 99%

TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Staš

Hládek

Viszlay

et al. 2017

Journal of Linguistics/Jazykovedný Casopis

View full text Add to dashboard Cite

This paper describes a new Slovak speech recognition dedicated corpus built from TEDx talks and Jump Slovakia lectures. The proposed speech database consists of 220 talks and lectures in total duration of about 58 hours. Annotated speech database was generated automatically in an unsupervised manner by using acoustic speech segmentation based on principal component analysis and automatic speech transcription using two complementary speech recognition systems. The evaluation data consisting of 50 manually annotated talks and lectures in total duration of about 12 hours, has been created for evaluation of the quality of Slovak speech recognition. By unsupervised automatic annotation of TEDx talks and Jump Slovakia lectures we have obtained 21.26% of new speech segments with approximately 9.44% word error rate, suitable for retraining or adaptation of acoustic models trained beforehand.

show abstract

“…This paper focuses on language modeling to improve recognition performance because language modeling is another important, challenging issue in transcribing podcasts and other similar content such as YouTube video clips featuring spoken documents. In the literature, various works have been done in terms of language model (LM) adaptation for several LVCSR tasks like broadcast news [6][7], meetings [8][9], and lectures [10] [11]. These works basically took an approach that a main (or background) LM trained from large amounts of task-specific text data is adapted using an additional resource such as indomain (on-topic) text data [7][8], web-based text data [6] [11], or user-provided text data [9] [10].…”

Section: Introductionmentioning

confidence: 99%

“…In the literature, various works have been done in terms of language model (LM) adaptation for several LVCSR tasks like broadcast news [6][7], meetings [8][9], and lectures [10] [11]. These works basically took an approach that a main (or background) LM trained from large amounts of task-specific text data is adapted using an additional resource such as indomain (on-topic) text data [7][8], web-based text data [6] [11], or user-provided text data [9] [10]. In our target having the diversity of topics, however, such a large-scale task-specific corpus cannot be prepared in advance to train the background LM.…”

Section: Introductionmentioning

confidence: 99%