Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004 - SpeechIR '04 2004
DOI: 10.3115/1626307.1626309
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and processing of lecture audio data

Abstract: In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and the vocabulary usage patterns across three courses. Finally, we examine language model perplexities trained from writt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
51
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 70 publications
(51 citation statements)
references
References 14 publications
0
51
0
Order By: Relevance
“…We conducted spoken utterance retrieval experiments with lecture speech recordings in MIT lecture corpus [5]. The corpus is a collection of audio-visual recordings of lectures and seminars presented at MIT, which is approximately 300 hours containing lectures from eight different courses and from 80 seminars given on a variety of topics.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We conducted spoken utterance retrieval experiments with lecture speech recordings in MIT lecture corpus [5]. The corpus is a collection of audio-visual recordings of lectures and seminars presented at MIT, which is approximately 300 hours containing lectures from eight different courses and from 80 seminars given on a variety of topics.…”
Section: Methodsmentioning
confidence: 99%
“…In SUR experiments with lecture speech recordings from MIT lecture corpus [5], we compared several indexing methods including word/phone confusion networks and the combined networks.…”
Section: Introductionmentioning
confidence: 99%
“…Lecture transcription has been the target of much bigger research projects such as the Japanese project described in [2], the European project CHIL (Computers In The Human Communication Loop) [3], and the American iCampus Spoken Lecture Processing project [4]. In some of these projects, the concept of lecture is different.…”
Section: Introductionmentioning
confidence: 99%
“…The speech data used in this paper is taken from a corpus of audio lectures collected at MIT [1]. The entire corpus consists of approximately 300 hours of lectures from a variety of academic courses and seminars.…”
Section: Speech Datamentioning
confidence: 99%
“…In particular, we have observed that for many educational lectures, the active vocabulary is typically very small, but includes Support for this research was provided in part by the National Science Foundation under grant #IIS-0415865. many topic specific terms and phrases that are not frequently occurring in conversational speech [1]. Recognition experiments on these lectures show that reducing the OOV rate improves accuracy, but that including unnecessary words is detrimental to performance [2].…”
Section: Introductionmentioning
confidence: 99%