On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models

Tomashenko, Natalia; Khokhlov, Yuri Y.; Estève, Yannick

doi:10.21437/interspeech.2016-1230

Cited by 10 publications

(8 citation statements)

References 77 publications

(124 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of GMM-derived (GMMD) features has been shown to provide an efficient technique of neural network AM adaptation for different adaptation tasks, such as speaker adaptation [27,30,40], environment or noise adaptation [31,32]. In this section, we describe a standard (Section 3.1) and improved (Section 3.2) SAT procedures for neural network AMs using GMMD features.…”

Section: Improved Sat Using Gmmd Featuresmentioning

confidence: 99%

See 1 more Smart Citation

Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition

Tomashenko¹,

Khokhlov²,

Estève³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

This work investigates speaker adaptation and regularization techniques for deep neural network acoustic models (AMs) in automatic speech recognition (ASR) systems. In previous works, GMM-derived (GMMD) features have been shown to be an efficient technique for neural network AM adaptation. In this paper, we propose and investigate a novel way to improve speaker adaptive training (SAT) for neural network AMs using GMMD features. The idea is based on using inaccurate transcriptions from ASR for adaptation during neural network training, while keeping the exact transcriptions for targets of neural networks. In addition, we apply a mixup technique, recently proposed for classification tasks, to acoustic models for ASR and investigate the impact of this technique on speaker adapted acoustic models. Experimental results on the TED-LIUM corpus show that the proposed approaches provide an additional gain in speech recognition performance in comparison with the speaker adapted AMs.

show abstract

Section: Improved Sat Using Gmmd Featuresmentioning

confidence: 99%

“…Characteristics of the obtained data sets are given in Table 1. A more detailed description of data can be found in [40]. For evaluation, a 4-gram language model (LM) with 152K word vocabulary was used.…”

Section: Data Setsmentioning

confidence: 99%

Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition

Tomashenko¹,

Khokhlov²,

Estève³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…This idea led to the proposed method of adaptation based on GMM-derived features, designed in details in the papers [24][25][26]. The scheme of method application in its original variant (for speaker adaptation 2 ) is depicted on Fig.…”

Section: Adaptation Based On Gmm-derived Featuresmentioning

confidence: 99%

“…As a result, the main DNN-HMM acoustic model remains unchanged. Number of experiments described in [24][25][26] show that although SI-DNN-HMM on GMM-derived features works worse than on MFCCs, its adaptation is extremely efficient. The illustration of the last statement (with application to speaker adaptation) is shown on Fig.…”

Section: Adaptation Based On Gmm-derived Featuresmentioning

confidence: 99%

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Korenevsky

Matveev

Yakovlev

2017

Proceedings of the Scientific-Practical Conference "Research and Development - 2016"

View full text Add to dashboard Cite

Aims and objectives of the study are described; state-of-the-art techniques in the study area are outlined. Several effective approaches proposed in the study and targeted at robustness improvement in complex acoustic environments are described. They are multichannel alignment algorithm, vector Taylor series-based features compensation with phase-term modeling, and environment adaptation method based on GMM-derived features. Experimental results analysis and comparison to state of the art are presented.

show abstract

“…They are concatenated with conventional acoustic features and used for DNN training and decoding. The benefit of GMM-derived features has recently been shown in [26]- [28] in the context of speaker adaptation of DNN-based acoustic models. The authors in [29] also used GMM log-likelihoods as input features (without conventional acoustic features) for adaptation to stationary noise.…”

Section: Introductionmentioning

confidence: 99%

DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

Nathwani

Illina

2018

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

The uncertainty decoding framework is known to improve deep neural network (DNN) based automatic speech recognition (ASR) performance in noisy environments. It operates by estimating the statistical uncertainty about the input features and propagating it to the output senone posteriors by sampling. Unfortunately, this approximate propagation scheme limits the performance improvement. In this work, we exploit the fact that uncertainty propagation can be achieved in closed form for Gaussian mixture acoustic models (GMMs). We introduce new GMM-derived (GMMD) uncertainty features for robust DNNbased acoustic model training and decoding. The GMMD features are computed as the difference between the GMM log-likelihoods obtained with vs. without uncertainty. They are concatenated with conventional acoustic features and used as inputs to the DNN. We evaluate the resulting ASR performance on the CHiME-2 and CHiME-3 datasets. The proposed features are shown to improve performance on both datasets, both for conventional decoding and for uncertainty decoding with different uncertainty estimation/propagation techniques.

show abstract

On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models

Cited by 10 publications

References 77 publications

Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition

Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR

Contact Info

Product

Resources

About