2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960625
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition

Abstract: In this paper we propose discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition tasks. After presenting our hierarchical modeling framework, we describe how the models can be generated with either Minimum Classification Error or large-margin training. Experiments on a large vocabulary lecture transcription task show that the hierarchical model can yield more than 1.0% absolute word error rate reduction over non-hierarchical models for both kinds of discrimina… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
7
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…To compute WER, we use a speaker-independent speech recognizer [12] with a large-margin discriminative hierarchical acoustic model [13]. The lectures are pre-segmented into utterances via forced alignment against the reference transcripts [14].…”
Section: Setupmentioning
confidence: 99%
“…To compute WER, we use a speaker-independent speech recognizer [12] with a large-margin discriminative hierarchical acoustic model [13]. The lectures are pre-segmented into utterances via forced alignment against the reference transcripts [14].…”
Section: Setupmentioning
confidence: 99%
“…In this paper, we extend our discriminative ETC method to the detection of deletion errors and apply it to recognition rate estimation (Section 2.2). In the experiments on the MIT lecture speech corpus [12], we obtained accurate recognition rate estimation results with our extended discriminative ETC method (Section 3.3).…”
Section: Introductionmentioning
confidence: 99%
“…OCW/MIT-World error rates for different approachesSpeech recognition experiments were carried out for the MIT OpenCourseWare (OCW) and MIT-World lecture speech corpus[13]. The training set used for this task consists of 101 hours of audio data and the evaluation set of 10 hours of audio.…”
mentioning
confidence: 99%