Deep Belief Networks using discriminative features for phone recognition

Mohamed, Abdelrahman; Sainath, Tara N.; Dahl, George E.; Ramabhadran, Bhuvana; Hinton, Geoffrey E.; Picheny, Michael

doi:10.1109/icassp.2011.5947494

Cited by 335 publications

(274 citation statements)

References 13 publications

Supporting

Mentioning

268

Contrasting

Unclassified

Order By: Relevance

“…In recent years, deep learning models have been used for phonetic classification and recognition on a variety of speech tasks and showed promising results [7,8]. A Deep Boltzmann Machine is a network of symmetrically coupled stochastic binary units [6,9].…”

Section: Deep Boltzmann Machinesmentioning

confidence: 99%

Resource configurable spoken query detection using Deep Boltzmann Machines

Zhang¹,

Salakhutdinov

Chang³

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this paper we present a spoken query detection method based on posteriorgrams generated from Deep Boltzmann Machines (DBMs). The proposed method can be deployed in both semi-supervised and unsupervised training scenarios. The DBM-based posteriorgrams were evaluated on a series of keyword spotting tasks using the TIMIT speech corpus. In unsupervised training conditions, the DBM-approach improved upon our previous best unsupervised keyword detection performance using Gaussian mixture model-based posteriorgrams by over 10%. When limited amounts of labeled data were incorporated into training, the DBM-approach required less than one third of the annotated data in order to achieve a comparable performance of a system that used all of the annotated data for training.

show abstract

Section: Deep Boltzmann Machinesmentioning

confidence: 99%

Resource configurable spoken query detection using Deep Boltzmann Machines

Zhang¹,

Salakhutdinov

Chang³

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…In their seminal work, Mohamed et al [1] proposed to use a system composed of many layers of logistic units. In order to overcome the notoriously difficult problem of optimizing very deep networks, they proposed to use a layer-wise unsupervised learning algorithm, called Restricted Boltzmann Machine (RBM) [2], as a way to provide a sensible initialization and they demonstrated significant improvements over the baseline GMM.…”

Section: Introductionmentioning

confidence: 99%

On rectified linear units for speech processing

Zeiler

Ranzato

Monga

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

376

238

View full text Add to dashboard Cite

Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. These units are linear when their input is positive and zero otherwise. In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology. Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks. All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data.

show abstract

“…7 min- utes), using 3-layer NN mapping brings more benefit. Recently, deep neural networks have been applied successfully for speech recognition [20]- [23]. They show significant improvements over 3-layer NNs.…”

Section: Discussion On Mapping Structurementioning

confidence: 99%

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages

Hai

Xiao

Chng

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper presents a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging well-trained acoustic models of other languages (called source languages). The idea is to use source language acoustic model to score the acoustic features of the target language, and then map these scores to the posteriors of the target phones using a classifier. The target phone posteriors are then used for decoding in the usual way of hybrid acoustic modeling. The motivation of such a strategy is that human languages usually share similar phone sets and hence it may be easier to predict the target phone posteriors from the scores generated by source language acoustic models than to train from scratch an under-resourced language acoustic model. The proposed method is evaluated using on the Aurora-4 task with less than 1 hour of training data. Two types of source language acoustic models are considered, i.e. hybrid HMM/MLP and conventional HMM/GMM models. In addition, we also use triphone tied states in the mapping. Our experimental results show that by leveraging well trained Malay and Hungarian acoustic models, we achieved 9.0% word error rate (WER) given 55 minutes of English training data. This is close to the WER of 7.9% obtained by using the full 15 hours of training data and much better than the WER of 14.4% obtained by conventional acoustic modeling techniques with the same 55 minutes of training data.

show abstract

Deep Belief Networks using discriminative features for phone recognition

Cited by 335 publications

References 13 publications

Resource configurable spoken query detection using Deep Boltzmann Machines

Resource configurable spoken query detection using Deep Boltzmann Machines

On rectified linear units for speech processing

Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages

Contact Info

Product

Resources

About