Language model adaptation for video lectures transcription

Martinez-Villaronga, Adria A.; Agua, Miguel A. del; Andrés-Ferrer, Jesús; Juan, Alfons

doi:10.1109/icassp.2013.6639314

Cited by 15 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compare our approach with a strong baseline computed from a large collection of out-of-domain and in-domain documents comprising 46 billion words. Furthermore, we compare our results with those obtained by slide adaptation [15], using as slides the text extracted from the video using OCR. We also combine both approaches to further improve adaptation which yields significant improvements with respect to both the baseline model and the slide-adapted model.…”

Section: Introductionmentioning

confidence: 95%

“…In this work, we further consider the scenario where the lecture slides can be extracted from the video using OCR and they are available to adapt the models [15], or a mixed scenario that combines both the text in the slides and the retrieved documents as follows…”

Section: Language Model Adaptation Techniquementioning

confidence: 99%

“…The language model adaptation technique for video lectures was introduced in [15]. It combines out-of-domain language models, in-domain models and videospecific models by means of a linear interpolation:…”

Section: Language Model Adaptation Techniquementioning

confidence: 99%

“…For instance, it is usual that either the author does not give access to the slide document, or the repository does not keep track of such files. When slides are not available in electronic format, they can be extracted from the video recording using OCR techniques [15]. However, due to the video quality, even this is not possible in many cases.…”

Section: Introductionmentioning

confidence: 99%

“…Despite current state-of-the-art automatic speech recognition (ASR) systems are achieving continuous improvements over time, these repositories can be greatly improved through the use of specifically retrieved in-domain data. For instance, in [15] the video-dependent in-domain data is extracted/retrieved from the slides used in each video. Specifically, a general-purpose ASR system was adapted through language model interpolation from different resources (out-of-domain and in-domain,) including the text of video-dependent slides.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Language Model Adaptation for Lecture Transcription by Document Retrieval

Martinez-Villaronga

Agua

Silvestre-Cerdí

et al. 2014

Advances in Speech and Language Technologies for Iberian Languages

View full text Add to dashboard Cite

Abstract. With the spread of MOOCs and video lecture repositories it is more important than ever to have accurate methods for automatically transcribing video lectures. In this work, we propose a simple yet effective language model adaptation technique based on document retrieval from the web. This technique is combined with slide adaptation, and compared against a strong baseline language model and a stronger slideadapted baseline. These adaptation techniques are compared within two different acoustic models: a standard HMM model and the CD-DNN-HMM model. The proposed method obtains improvements on WER of up to 14% relative with respect to a competitive baseline as well as outperforming slide adaptation.

show abstract

Section: Introductionmentioning

confidence: 95%

Section: Language Model Adaptation Techniquementioning

confidence: 99%

Section: Language Model Adaptation Techniquementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Language Model Adaptation for Lecture Transcription by Document Retrieval

Martinez-Villaronga

Agua

Silvestre-Cerdí

et al. 2014

Advances in Speech and Language Technologies for Iberian Languages

View full text Add to dashboard Cite

show abstract

Unsupervised Language Model Adaptation by Data Selection for Speech Recognition

Khassanov

Chong

Bigot

et al. 2017

Intelligent Information and Database Systems

View full text Add to dashboard Cite

Speaker-adapted confidence measures for speech recognition of video lectures

Sanchez-Cortina

Andrés-Ferrer

Sanchís

et al. 2016

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

Automatic Speech Recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent naïve Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the NB behaviour. Additionally, as a main contribution, we propose to adapt the CM to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the NB model is clearly superseded by the proposed LR classifier.

show abstract

Language model adaptation for video lectures transcription

Cited by 15 publications

References 14 publications

Language Model Adaptation for Lecture Transcription by Document Retrieval

Language Model Adaptation for Lecture Transcription by Document Retrieval

Unsupervised Language Model Adaptation by Data Selection for Speech Recognition

Speaker-adapted confidence measures for speech recognition of video lectures

Contact Info

Product

Resources

About