Large vocabulary continuous speech recognition with context-dependent DBN-HMMS

Dahl, George E.; Yu, Dong; Deng, Li; Acero, Alex

doi:10.1109/icassp.2011.5947401

Cited by 138 publications

(73 citation statements)

References 14 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, deep learning models have been used for phonetic classification and recognition on a variety of speech tasks and showed promising results [7,8]. A Deep Boltzmann Machine is a network of symmetrically coupled stochastic binary units [6,9].…”

Section: Deep Boltzmann Machinesmentioning

confidence: 99%

Resource configurable spoken query detection using Deep Boltzmann Machines

Zhang¹,

Salakhutdinov

Chang³

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this paper we present a spoken query detection method based on posteriorgrams generated from Deep Boltzmann Machines (DBMs). The proposed method can be deployed in both semi-supervised and unsupervised training scenarios. The DBM-based posteriorgrams were evaluated on a series of keyword spotting tasks using the TIMIT speech corpus. In unsupervised training conditions, the DBM-approach improved upon our previous best unsupervised keyword detection performance using Gaussian mixture model-based posteriorgrams by over 10%. When limited amounts of labeled data were incorporated into training, the DBM-approach required less than one third of the annotated data in order to achieve a comparable performance of a system that used all of the annotated data for training.

show abstract

Section: Deep Boltzmann Machinesmentioning

confidence: 99%

Resource configurable spoken query detection using Deep Boltzmann Machines

Zhang¹,

Salakhutdinov

Chang³

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The resulting deep belief nets learn a hierarchy of nonlinear feature detectors that can capture complex statistical patterns in data. The deep belief net training algorithm suggested in [24] first initializes the weights of each layer individually in a purely unsupervised 1 way and then fine-tunes the entire network using labeled data. This semi-supervised approach using deep models has proved effective in a number of applications, including coding and classification for speech, audio, text, and image data ( [25]- [29]).…”

Section: Introductionmentioning

confidence: 99%

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

Dahl

Deng

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

2,631

1,175

View full text Add to dashboard Cite

Abstract-We propose a novel context-dependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively.

show abstract

“…The revolution took place in 2010 after the close collaboration between academic and industrial research groups, including the University of Toronto, Microsoft, and IBM [1,4,5]. This research found that very significant performance improvements can be accomplished with the NN-based hybrid approach, with a few novel techniques and design choices: (1) extending NNs to DNNs, i.e., involving a large number of hidden layers (usually 4 to 8); (2) employing appropriate initialization methods, e.g., pre-training with restricted Boltzmann machines (RBMs); and (3) using fine-grained NN targets, e.g., context-dependent states.…”

Section: Introductionmentioning

confidence: 99%

Noisy training for deep neural networks in speech recognition

Yin

Liu

Zhang

et al. 2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This 'noise injection' technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.

show abstract

Large vocabulary continuous speech recognition with context-dependent DBN-HMMS

Cited by 138 publications

References 14 publications

Resource configurable spoken query detection using Deep Boltzmann Machines

Resource configurable spoken query detection using Deep Boltzmann Machines

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

Noisy training for deep neural networks in speech recognition

Contact Info

Product

Resources

About