Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer

Ochiai, Tsubasa; Matsuda, Shigeki; Watanabe, H.; Lu, Xugang; Kawai, Hisashi; Katagiri, Shigeru

doi:10.1109/icassp.2016.7472632

Cited by 3 publications

(3 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, a large network is obviously undesirable from the viewpoints of computational load and memory size; it is also unfavorable from the viewpoint of controlling training robustness to unseen data. To meet this requirement for finding a small, necessary, and sufficient DNN structure, several approaches have reshaped the network structure [3,4,5] or pruned the network nodes [6]. However, these methods assumed retraining or adapting a size-reduced network for high discriminative power.…”

Section: Introductionmentioning

confidence: 99%

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Ochiai

Matsuda

Watanabe

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We examine the effect of the Group Lasso (gLasso) regularizer in selecting the salient nodes of Deep Neural Network (DNN) hidden layers by applying a DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of gLasso regularization, one for outgoing weight vectors and another for incoming weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096 nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment results demonstrate that our DNN training, in which the gLasso regularizer was embedded, successfully selected the hidden layer nodes that are necessary and sufficient for achieving high classification power.

show abstract

Section: Introductionmentioning

confidence: 99%

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Ochiai

Matsuda

Watanabe

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In model-based speaker adaptive training this may be done by splitting the weights of the acoustic model into a speaker-independent and a speaker-dependent set. During training, a copy of the speaker-dependent weights is maintained and optimised for each speaker separately [12,13,14]. Here, we take an alternative approach: Instead of maintaining and optimising a separate copy of speaker-dependent weights for each speaker we embed speaker adaptation directly into the acoustic model training using a meta-learning approach in order to find a good initialisation for speaker-dependent weights.…”

Section: Speaker Adaptive Training As a Meta-learning Taskmentioning

confidence: 99%

“…In model-based speaker adaptive training, the acoustic model is parameterised as speakerdependent and speaker-independent weights. A copy of the speaker-dependent weights is maintained and optimised separately for each speaker during the training process in order to factor out speaker variation from the canonical speakerindependent acoustic model [12,13,14]. Finally, all hybrid approaches can be considered as speaker adaptive training because they provide information about speaker identity, which allows the acoustic model to easily remove speaker variation from the input features [9,10,15].…”

Section: Introductionmentioning

confidence: 99%

Speaker Adaptive Training Using Model Agnostic Meta-Learning

Klejch

Fainberg

Bell

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions. Conventionally, model-based speaker adaptive training is performed by having a set of speaker dependent parameters that are jointly optimised with speaker independent parameters in order to remove speaker variation. However, this does not scale well if all neural network weights are to be adapted to the speaker. In this paper we formulate speaker adaptive training as a meta-learning task, in which an adaptation process using gradient descent is encoded directly into the training of the model. We compare our approach with test-only adaptation of a standard baseline model and a SAT-LHUC model with a learned speaker adaptation schedule and demonstrate that the meta-learning approach achieves comparable results.Index Terms-speaker adaptation, speaker adaptive training, model-agnostic meta-learning

show abstract

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

Ochiai¹,

Matsuda²,

Watanabe³

et al. 2016

Preprint

View full text Add to dashboard Cite

Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer

Cited by 3 publications

References 21 publications

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Automatic node selection for Deep Neural Networks using Group Lasso regularization

Speaker Adaptive Training Using Model Agnostic Meta-Learning

Automatic Node Selection for Deep Neural Networks using Group Lasso Regularization

Contact Info

Product

Resources

About