An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition

Normandin, Yves; Morgera, S.D.

doi:10.1109/icassp.1991.150395

Cited by 49 publications

(27 citation statements)

References 9 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the last decade, much effort has been put in finding more efficient and reliable algorithms [3], [4], [8]- [11]. The de-facto standard to optimize discriminative HMMs in speech recognition is Extended Baum Welch (EBW) [12], [13] or more precisely empirical variants thereof [7], [10], [14].…”

Section: Introductionmentioning

confidence: 99%

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

Heigold

Ney

Schlüter

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophisticated update schemes for the step sizes and a considerable amount of parameter tuning, and only little is known about its convergence behavior. In this paper, we derive an EM-style algorithm for discriminative training of HMMs. Like Expectation-Maximization (EM) for the generative training of HMMs, the proposed algorithm improves the training criterion on each iteration, converges to a local optimum, and is completely parameter-free. We investigate the feasibility of the proposed EM-style algorithm for discriminative training of two tasks, namely grapheme-to-phoneme conversion and spoken digit string recognition.

show abstract

Section: Introductionmentioning

confidence: 99%

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

Heigold

Ney

Schlüter

2013

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Hence, although auxiliary functions may be computed for each of these, the difference of two lower-bounds is not itself a lowerbound and so standard EM cannot be used. To handle this problem, the extended Baum-Welch (EBW) criterion was proposed [64,129]. In this case, standard EM-like auxiliary functions are defined for the numerator and denominator but stability during re-estimation is achieved by adding scaled current model parameters to the numerator statistics.…”

Section: Parameter Estimationmentioning

confidence: 99%

The Application of Hidden Markov Models in Speech Recognition

Gales

Young

2007

FNT in Signal Processing

427

155

View full text Add to dashboard Cite

Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs.Whereas the basic principles underlying HMM-based LVCSR are rather straightforward, the approximations and simplifying assumptions involved in a direct implementation of these principles would result in a system which has poor accuracy and unacceptable sensitivity to changes in operating environment. Thus, the practical application of HMMs in modern systems involves considerable sophistication.The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then describe the various refinements which are needed to achieve state-of-the-art performance. These refinements include feature projection, improved covariance modelling, discriminative parameter estimation, adaptation and normalisation, noise compensation and multi-pass system combination. The review concludes with a case study of LVCSR for Broadcast News and Conversation transcription in order to illustrate the techniques described.

show abstract

“…Most applications of discriminative training methods for speech recognition use either the maximum mutual information (MMI) (Bahl et al, 1986;Brown, 1987;Cardin et al, 1993;Chow, 1990;Kapadia et al, 1993;Normandin, 1996;Normandin et al, 1994a,b;Normandin and Morgera, 1991;Reichl and Ruske, 1995;Valtchev et al, 1996Valtchev et al, , 1997 or the minimum classi®cation error (MCE) (Chou et al, 1992(Chou et al, , 1993(Chou et al, , 1994Paliwal et al, 1995;Reichl and Ruske, 1995) criterion. In MCE training, an approximation to the error rate on the training data is optimized, whereas MMI training optimizes the a posteriori probability of the training utterances and hence the class separability.…”

Section: Introductionmentioning

confidence: 99%

“…EB is an extension to the standard Baum±Welch algorithm designed for optimization of the MMI criterion. EB was ®rst developed for discriminative training of discrete probabilities (Cardin et al, 1993;Gopalakrishnan et al, 1991;Normandin et al, 1994a;Normandin and Morgera, 1991), but was later extended to continuous densities (Normandin, 1991(Normandin, , 1996. Optimization of the MCE criterion is usually performed in combination with GD.…”

Section: Introductionmentioning

confidence: 99%

Comparison of discriminative training criteria and optimization methods for speech recognition

Schlüter

Macherey

Müller

et al. 2001

Speech Communication

View full text Add to dashboard Cite

The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A uni®ed discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The uni®ed criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classi®cation error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for e cient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate signi®cant di erences between EB and GD optimization. For acoustic models of low complexity, MCE training gave signi®cantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a signi®cant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No signi®cant correlation has been observed between the language models chosen for training and recognition. Ó 2001 Elsevier Science B.V. All rights reserved. ZusammenfassungZiel dieser Arbeit ist die Scha ung eines einheitlichen Rahmens fur eine Klasse von diskriminativen Trainingskriterien und Optimierungsmethoden fur die kontinuierliche Spracherkennung. Dazu wird ein einheitliches Kriterium de®niert, das auf Wahrscheinlichkeitsverhaltnissen von korrekten und konkurrierenden Modellen basiert. Spezielle Kriterien ergeben sich daraus durch die Wahl der konkurrierenden Wortfolgen sowie der Glattung. Fur die Kriterien maximum mutual information (MMI) und minimum classi®cation error (MCE), sowie deren Optimierung mittels Gradientenabstieg (GD) und erweitertem Baum ( ResumeLe but de ce travail est de de®nir un cadre commun incluant un ensemble de criteres d'apprentissage discriminant et de methodes d'optimisation pour la reconnaissance de la parole continue. Nous introduisons un critere discriminant fonde sur le rapport entre la vraissemblance des modeles corrects et concurrents. Ce critere general conduit a de®nir des criteres speci®ques par le choix des sequences de mots en concurrence et par celui de la methode de lissage. Des comparaisons analytiques et experiment...

show abstract

An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition

Cited by 49 publications

References 9 publications

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

The Application of Hidden Markov Models in Speech Recognition

Comparison of discriminative training criteria and optimization methods for speech recognition

Contact Info

Product

Resources

About