Cluster adaptive training for deep neural network

Tan, Tian; Ye, Qian; Yin, Maofan; Zhuang, Yimeng; Yu, Kai

doi:10.1109/icassp.2015.7178787

Cited by 64 publications

(40 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other subspace methods include cluster adaptive training (CAT) [113], [114] and factorized hidden layer (FHL) [115], [116], where the transformations are confined into the speaker subspace. Similar to the eigenvoice [117] or cluster adaptive training [118] in the Gaussian mixture model era, CAT [113], [114] in DNN training constructs multiple DNNs to form the bases of a canonical parametric space. During adaptation, an interpolation vector which is associated to a target speaker or environment is estimated online to combine the multiple DNN bases into a single adapted DNN.…”

Section: A Acoustic Model Adaptationmentioning

confidence: 99%

See 1 more Smart Citation

Recent progresses in deep learning based acoustic models

2017

IEEE/CAA J. Autom. Sinica

162

View full text Add to dashboard Cite

In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques. We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination with other models. We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion, and the attention-based sequenceto-sequence model. We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research. 1

show abstract

Section: A Acoustic Model Adaptationmentioning

confidence: 99%

“…An issue in the CAT-style methods is that the bases are fullrank matrices, which require very large amount of training data. Therefore, the number of bases in CAT is usually constrained to few [113], [114]. A solution is to use FHL [115], [116] which constrains the bases to be rank-1 matrices.…”

Section: A Acoustic Model Adaptationmentioning

confidence: 99%

Recent progresses in deep learning based acoustic models

2017

IEEE/CAA J. Autom. Sinica

162

View full text Add to dashboard Cite

show abstract

“…During adaptation, an interpolation vector, specific to a particular acoustic condition, is used to combine the multiple sub-networks into a single adapted DNN. We refer to this factorized DNN training as cluster adaptive training (CAT) following [11]. CAT was initially proposed for GMM-HMM acoustic models [24], and later extended to DNN by introducing multiple canonical weight matrices for a DNN layer as depicted in Fig.…”

Section: Cluster Adaptive Trainingmentioning

confidence: 99%

“…In [11], [12] and [13], multiple weight matrices or sub-networks are constructed to form the bases of a canonical parametric space. During adaptation, an interpolation vector, specific to a particular acoustic condition, is used to combine the multiple sub-networks into a single adapted DNN.…”

Section: Cluster Adaptive Trainingmentioning

confidence: 99%

See 1 more Smart Citation

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Tong¹,

Garner²,

Bourlard³

2017

Interspeech 2017

View full text Add to dashboard Cite

Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of language-specific senones or one single universal IPA-based multilingual senone set. Both architectures are investigated, exploiting and comparing different language adaptive training (LAT) techniques originating from successful DNN-based speaker-adaptation. More specifically, speaker adaptive training methods such as Cluster Adaptive Training (CAT) and Learning Hidden Unit Contribution (LHUC) are considered. In addition, a language adaptive output architecture for IPA-based universal DNN is also studied and tested.Experiments show that LAT improves the performance and adaptation on the top layer further improves the accuracy. By combining state-level minimum Bayes risk (sMBR) sequence training with LAT, we show that a language adaptively trained IPA-based universal DNN outperforms a monolingually sequence trained model.

show abstract

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

Sim¹,

Mantena³

et al. 2017

New Era for Robust Speech Recognition

View full text Add to dashboard Cite

Cluster adaptive training for deep neural network

Cited by 64 publications

References 21 publications

Recent progresses in deep learning based acoustic models

Recent progresses in deep learning based acoustic models

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

Contact Info

Product

Resources

About