2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6853585
|View full text |Cite
|
Sign up to set email alerts
|

On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition

Abstract: Recently, context-dependent Deep Neural Network (CD-DNN) has been found to significantly outperform Gaussian Mixture Model (GMM) for various large vocabulary continuous speech recognition tasks. Unlike the GMM approach, there is no meaningful interpretation of the DNN parameters, which makes it difficult to devise effective adaptation methods for DNNs. Furthermore, DNN parameter estimation is based on discriminative criteria, which is more sensitive to label errors and therefore less reliable for unsupervised … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…For instance, a group of adaptation transform functions could be utilized to capture more structures of adaptation data and model parameters in DNN model parameter adaptation. Since the GMM-HMM framework is based on the generative training paradigm, robustness for unsupervised adaptation could also be enabled by combining GMM and DNN to enhance the ASR performance (Liu and Sim, 2014). In addition, DNN-HMM systems involve enormous quantities of neural network parameters and special optimization processes are often necessary to avoid overfitting of the limited adaptation data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, a group of adaptation transform functions could be utilized to capture more structures of adaptation data and model parameters in DNN model parameter adaptation. Since the GMM-HMM framework is based on the generative training paradigm, robustness for unsupervised adaptation could also be enabled by combining GMM and DNN to enhance the ASR performance (Liu and Sim, 2014). In addition, DNN-HMM systems involve enormous quantities of neural network parameters and special optimization processes are often necessary to avoid overfitting of the limited adaptation data.…”
Section: Discussionmentioning
confidence: 99%
“…Since DNN parameter estimation is based on discriminative criterion, adaptation performance is sensitive to label errors. A combination of GMM and DNN has also effectively enhanced ASR performance (Liu and Sim, 2014) since the GMM-HMM framework is based on generative training paradigms for more robust unsupervised adaptation. This study evaluates the ATG-ESSEM framework using a difficult task designed to simulate "real-world" conditions: per-utterance unsupervised adaptation with lots of fluctuating SNRs.…”
Section: Introductionmentioning
confidence: 97%
“…Similar ideas are also presented in [Swietojanski et al, 2013]. Other methods include temporally varying weight regression [Liu & Sim, 2014] and GMMD features [Tomashenko & Khokhlov, 2014;Tomashenko et al, 2016d;Tomashenko & Khokhlov, 2015].…”
Section: Adaptation Based On Gmmsmentioning
confidence: 99%
“…However, various adaptation algorithms that have been developed for GMM-HMM systems cannot be easily applied to DNNs because of the different nature of these models. Many new adaptation methods have recently been developed for DNNs, and a few of them [1][2][3][4][5] take advantage of robust adaptability of GMMs. However, there is no universal method for efficient transfer of all adaptation algorithms from the GMM framework to DNN models.…”
Section: Introductionmentioning
confidence: 99%
“…Most of the existing methods for adapting DNN models can be classified into several types: (1) linear transformation, (2) regularization techniques, (3) auxiliary features, (4) multi-task learning, (5) combining GMM and DNN models. Linear transformation can be applied at different levels of the DNN system: to the input features, as in linear input network transformation (LIN) [6] or featurespace discriminative linear regression (fDLR); to the activations of hidden layers, as in linear hidden network transformation (LHN) [6]; or to the softmax layer, as in LON or in output-feature discriminative linear regression.…”
Section: Introductionmentioning
confidence: 99%