2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639249
|View full text |Cite
|
Sign up to set email alerts
|

Improving ASR by integrating lecture audio and slides

Abstract: We propose a method to combine audio of a lecture with its supporting slides in order to improve automatic speech recognition performance. We view both the lecture speech and the slides as parallel streams which contain redundant information. We integrate both streams in order to bias the recognizer's language model towards the words in the slides, by first aligning the speech with the slide words, thus correcting errors on the ASR transcripts. We obtain a 5.9% relative WER improvement on a lecture test set, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…This approach was examined for several types of target contents: a pair of the lecture speech transcription and its lecture slide [7], and a pair of the discussion speech transcription and its target newspaper article [8]. The other approach, a machine translation based method, was employed to align a lecture speech signal and its lecture slide, in order to improve automatic speech recognition performance [11]. This paper focuses into the alignment problem between lecture utterances and lecture slide components, thus, obviously belongs to the second category.…”
Section: Related Workmentioning
confidence: 99%
“…This approach was examined for several types of target contents: a pair of the lecture speech transcription and its lecture slide [7], and a pair of the discussion speech transcription and its target newspaper article [8]. The other approach, a machine translation based method, was employed to align a lecture speech signal and its lecture slide, in order to improve automatic speech recognition performance [11]. This paper focuses into the alignment problem between lecture utterances and lecture slide components, thus, obviously belongs to the second category.…”
Section: Related Workmentioning
confidence: 99%
“…If only ASR technology is used, it may lead to the wrong recognition of proprietary entities in the current slide. In the field of ASR assisted by slides, some early papers use slides to build language models [22,23], while others use complete static slides to extract rare words and improve results using a contextual bias ASR model [24].…”
Section: Introductionmentioning
confidence: 99%
“…Assuming that the Word Error Rate (WER) metric is not relevant enough to compare the ASR system performance for such specific tasks [1,2], we explore the use of more relevant evaluation metrics to analyse the effects of the ASR language model adaptation. Language Model (LM) adaptation of spoken lectures is a well-known issue in the literature [3,4,5,6,7,8,9]. In 2002, [10] authors already demonstrated that the use of a topic-related vocabulary improves speech recognition and indexing for video lectures.…”
Section: Introductionmentioning
confidence: 99%