Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition

Dugast, Christian; Devillers, Laurence; Aubert, Xavier L.

doi:10.1109/89.260364

Cited by 24 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These investigations, mainly based on model averaging, showed some success when combing contextindependent hybrid systems based on multi-layer perceptrons (MLPs) and recurrent neural networks (RNNs). Dugast et al [16] combined posterior probability estimates obtained from a time-delay neural network with the likelihoods generated by an HMM system with state emissions modelled by a mixture of Laplacians. Similar approaches combining scaledlikelihoods produced by a two-layer MLP and HMM-GMM likelihoods were also investigated [17].…”

Section: Relation To Prior Workmentioning

confidence: 99%

Revisiting hybrid and GMM-HMM system combination techniques

Swietojanski

Ghoshal

Renals

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In this paper we investigate techniques to combine hybrid HMM-DNN (hidden Markov model -deep neural network) and tandem HMM-GMM (hidden Markov model -Gaussian mixture model) acoustic models using: (1) model averaging, and (2) lattice combination with Minimum Bayes Risk decoding. We have performed experiments on the "TED Talks" task following the protocol of the IWSLT-2012 evaluation. Our experimental results suggest that DNN-based and GMMbased acoustic models are complementary, with error rates being reduced by up to 8% relative when the DNN and GMM systems are combined at model-level in a multi-pass automatic speech recognition (ASR) system. Additionally, further gains were obtained by combining model-averaged lattices with the one obtained from baseline systems.

show abstract

Section: Relation To Prior Workmentioning

confidence: 99%

Revisiting hybrid and GMM-HMM system combination techniques

Swietojanski

Ghoshal

Renals

2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…The proposed technique allowed for a signi"cant gain in performance for some classes of confusable letters (e.g.,`Ba,`Da, and`Va), from 491 spellings of names collected over the telephone channel. The approach proposed by [31] is not truly a hybrid, since it uses a standard HMM (trained with Viterbi algorithm on the ML criterion), which is used in parallel with connectionist models by combining the estimates of the emission probabilities (likelihoods) provided by the HMM with the normalized scores obtained with the ANNs. The linear combination scheme is the following:…”

Section: Other Approachesmentioning

confidence: 99%

“…Parallel ANN/HMM state-probability estiamtes [31] HMM estimates of the emission probabilities are linearly combined with scores obtained with a hierarchical mixture of TDNNs.…”

Section: Modelmentioning

confidence: 99%

A survey of hybrid ANN/HMM models for automatic speech recognition

2001

View full text Add to dashboard Cite

In spite of the advances accomplished throughout the last decades, automatic speech recognition (ASR) is still a challenging and di$cult task. In particular, recognition systems based on hidden Markov models (HMMs) are e!ective under many circumstances, but do su!er from some major limitations that limit applicability of ASR technology in real-world environments. Attempts were made to overcome these limitations with the adoption of arti"cial neural networks (ANN) as an alternative paradigm for ASR, but ANN were unsuccessful in dealing with long time-sequences of speech signals. Between the end of the 1980s and the beginning of the 1990s, some researchers began exploring a new research area, by combining HMMs and ANNs within a single, hybrid architecture. The goal in hybrid systems for ASR is to take advantage from the properties of both HMMs and ANNs, improving #exibility and recognition performance. A variety of di!erent architectures and novel training algorithms have been proposed in literature. This paper reviews a number of signi"cant hybrid models for ASR, putting together approaches and techniques from a highly specialistic and non-homogeneous literature. E!orts concentrate on describing and referencing architectures and algorithms, their advantages and limitations, as well as on categorizing them into broad classes. Early attempts to emulate HMMs by ANNs are "rst described. Then we focus on ANNs to estimate posterior probabilities of the states of an HMM and on`globala optimization, where a single, overall training criterion is de"ned over the HMM and the ANNs. Connectionist vector quantization for discrete HMMs, and other more recent approaches are also reviewed. It is pointed out that, in addition to their theoretical interest, hybrid systems have been allowing for tangible improvements in recognition performance over the standard HMMs in di$cult and signi"cant benchmark tasks.

show abstract

“…A part of the prosodic information is 88 obviously linguistic, but the rest of it conveys non-linguistic information. In our flowchart this branch is fuzzier than the other: levels and units are less clearly defined.…”

Section: Hierarchical Organization Of Speech Perceptionmentioning

confidence: 99%