Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition

Chang, Hung-An; Glass, James

doi:10.1109/icassp.2009.4960625

Cited by 12 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To compute WER, we use a speaker-independent speech recognizer [12] with a large-margin discriminative hierarchical acoustic model [13]. The lectures are pre-segmented into utterances via forced alignment against the reference transcripts [14].…”

Section: Setupmentioning

confidence: 99%

Language model parameter estimation using user transcriptions

Hsu¹,

Glass²

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsupervised adaptation obtains 80% of the potential error reductions, it is outperformed by using only 300 words of user transcription. By transcribing the lowest confidence utterances first, we further obtain an effective word error rate reduction of 0.6%.

show abstract

Section: Setupmentioning

confidence: 99%

Language model parameter estimation using user transcriptions

Hsu¹,

Glass²

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…In this paper, we extend our discriminative ETC method to the detection of deletion errors and apply it to recognition rate estimation (Section 2.2). In the experiments on the MIT lecture speech corpus [12], we obtained accurate recognition rate estimation results with our extended discriminative ETC method (Section 3.3).…”

Section: Introductionmentioning

confidence: 99%

Recognition rate estimation based on word alignment network and discriminative error type classification

Ogawa

Hori

Nakamura

2012

2012 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.

show abstract

“…OCW/MIT-World error rates for different approachesSpeech recognition experiments were carried out for the MIT OpenCourseWare (OCW) and MIT-World lecture speech corpus[13]. The training set used for this task consists of 101 hours of audio data and the evaluation set of 10 hours of audio.…”

mentioning

confidence: 99%

Discriminative training based on an integrated view of MPE and MMI in margin and error space

McDermott

Watanabe

Nakamura

2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Recent work has demonstrated that the Maximum Mutual Information (MMI) objective function is mathematically equivalent to a simple integral of recognition error, if the latter is expressed as a margin-based Minimum Phone Error (MPE) style error-weighted objective function. This led to the proposal of a general approach to discriminative training based on integrals of MPE-style loss, calculated using "differenced MMI" (dMMI), a finite difference of MMI functionals evaluated at the edges of a margin interval. This article aims to clarify the essence and practical consequences of the new framework. The recently proposed Error-Indexed ForwardBackward Algorithm is used to visualize the close agreement between dMMI and MPE statistics for narrow margin intervals, and to illustrate the flexible control of the weight that can be given to different error levels using broader intervals. New speech recognition results are presented for the MIT OpenCourseWare/MIT-World corpus, showing small performance gains for dMMI compared to MPE for some choices of margin interval. Evaluation with an expanded 44K word trigram language model confirms that dMMI with a narrow margin interval yields the same performance as MPE.

show abstract

Discriminative training of hierarchical acoustic models for large vocabulary continuous speech recognition

Cited by 12 publications

References 11 publications

Language model parameter estimation using user transcriptions

Language model parameter estimation using user transcriptions

Recognition rate estimation based on word alignment network and discriminative error type classification

Discriminative training based on an integrated view of MPE and MMI in margin and error space

Contact Info

Product

Resources

About