Multi-task Learning of Deep Neural Networks for Low-resource Speech Recognition

Chen, Dongpeng; Mak, Brian

doi:10.1109/taslp.2015.2422573

Cited by 62 publications

(64 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the multilingual training using universal phone set does not show much improvement and in most cases it is even worse. The result is consistent with previous work [3,9,10]. Although the IPA-based multilingual modelling enjoys richer data resources, it has a larger set of units to model as well.…”

Section: Resultssupporting

confidence: 92%

“…Thus there is sufficient data to train the universal phones. However, it is usually found that the performance of the universal acoustic models is worse than the language-specific acoustic models unless the amount of training data for the target language is really small [9,10]. Although the universal model may share data among various languages, mixture of data creates more variation especially for those identical IPA symbols shared among different languages.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Tong¹,

Garner²,

Bourlard³

2017

Interspeech 2017

View full text Add to dashboard Cite

Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of language-specific senones or one single universal IPA-based multilingual senone set. Both architectures are investigated, exploiting and comparing different language adaptive training (LAT) techniques originating from successful DNN-based speaker-adaptation. More specifically, speaker adaptive training methods such as Cluster Adaptive Training (CAT) and Learning Hidden Unit Contribution (LHUC) are considered. In addition, a language adaptive output architecture for IPA-based universal DNN is also studied and tested.Experiments show that LAT improves the performance and adaptation on the top layer further improves the accuracy. By combining state-level minimum Bayes risk (sMBR) sequence training with LAT, we show that a language adaptively trained IPA-based universal DNN outperforms a monolingually sequence trained model.

show abstract

Section: Resultssupporting

confidence: 92%

Section: Introductionmentioning

confidence: 99%

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Tong¹,

Garner²,

Bourlard³

2017

Interspeech 2017

View full text Add to dashboard Cite

show abstract

“…The predominant one being applied to ASR is heterogeneous transfer learning (Wang and Zheng, 2015) which involves training a base model on multiple languages (and tasks) simultaneously. While this achieves some competitive results (Chen and Mak, 2015;Knill et al, 2014), it still requires large amounts of data to yield robust improvements (Heigold et al, 2013).…”

Section: Related Workmentioning

confidence: 99%

Proceedings of the 2nd Workshop on Representation Learning for NLP

2017

View full text Add to dashboard Cite

“…In the MTL framework, several related tasks are jointly trained with shared hidden layers to improve the generalization power of each task [6]- [10]. In the proposed approach, the main task of VAD is jointly trained with a subsidiary task of feature enhancement.…”

Section: Introductionmentioning

confidence: 99%

DNN-Based Voice Activity Detection with Multi-Task Learning

Kang

Kim

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARY Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm. key words: deep neural network, voice activity detection, multi-task learning

show abstract

Multi-task Learning of Deep Neural Networks for Low-resource Speech Recognition

Abstract: If it is the author's pre-published version, changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published version.

Cited by 62 publications

References 51 publications

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Proceedings of the 2nd Workshop on Representation Learning for NLP

DNN-Based Voice Activity Detection with Multi-Task Learning

Contact Info

Product

Resources

About