Multilingual deep neural network based acoustic modeling for rapid language adaptation

Vu, Ngoc Thang; Imseng, David; Povey, Daniel; Motlíček, Petr; Schultz, Tanja; Bourlard, Herv

doi:10.1109/icassp.2014.6855086

Cited by 113 publications

(75 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multilingual training of deep neural network (DNN)-based ASR systems has provided some improvements in the automatic recognition of both low-and high-resourced languages [13][14][15][16][17][18][19][20][21][22]. Some of these techniques incorporate multilingual DNNs for feature extraction [13,18,23,24].…”

Section: Introductionmentioning

confidence: 99%

Code-switching detection using multilingual DNNS

Yılmaz

Heuvel

Leeuwen

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon. For this purpose, we train several multilingual DNN models on Frisian and two closely related languages, namely English and Dutch, to compare the impact of single-step and two-step multilingual DNN training on the recognition and code-switching detection performance. We apply bilingual DNN retraining on both target languages by varying the amount of training data belonging to the higher-resourced target language (Dutch). The recognition results show that the multilingual DNN training scheme with an initial multilingual training step followed by bilingual retraining provides recognition performance comparable to an oracle baseline recognizer that can employ language-specific acoustic models. We further show that we can detect code-switches at the word level with an equal error rate of around 17% excluding the deletions due to ASR errors.

show abstract

Section: Introductionmentioning

confidence: 99%

Code-switching detection using multilingual DNNS

Yılmaz

Heuvel

Leeuwen

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

show abstract

“…As pointed earlier in Section 1, in phone-based ASR the multilingual ANN can be simply adapted by replacing the last layer with clustered CD phones of TL [14,16]. A similar approach could be employed in our scenario in which the acoustic units are the clustered CD graphemes of TL.…”

Section: Comparison To Related Approachesmentioning

confidence: 99%

“…The multilingual acoustic model is then adapted on target language (TL) data based on a deterministic lexical model learned on TL data. The adaptation process can also involve redefinition of acoustic unit space based on TL data [14,15,16]. In the absence of lexical resources in the literature, typically graphemes are used as subword units [17,18,19].…”

Section: Introductionmentioning

confidence: 99%

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery

Razavi¹,

Magimai-Doss²

2016

Interspeech 2016

View full text Add to dashboard Cite

Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on languageindependent data; and then training a lexical model that captures grapheme-to-multilingual phone relationships on the target language data. In this paper, we show that such an approach can be employed to discover a latent space of subword units for under-resourced languages, and subsequently improve the performance of the ASR system through both acoustic and lexical model adaptation. Specifically, we present two approaches to discover the latent space: (1) inference of a subset of the multilingual phone set based on the learned graphemeto-multilingual phone relationships, and (2) derivation of automatic subword unit space based on clustering of the graphemeto-multilingual phone relationships. Experimental studies on Scottish Gaelic, a truly under-resourced language, show that both approaches lead to significant performance improvements, with the latter approach yielding the best system.

show abstract

“…Meanwhile, DNN is also trained to model one single universal multilingual senone set. Phones of multiple languages are all explicitly mapped to a universal phone set (e.g., IPA) [6,9]. Thus there is sufficient data to train the universal phones.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, the hidden layers can be trained jointly using data from multiple languages to benefit each other [3,5]. The target of the multilingual DNN can be either the universal International Phonetic Alphabet (IPA) based multilingual senones [6] or a layer consisting of separate activations for each language [3,7,8].…”

Section: Introductionmentioning

confidence: 99%

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Tong¹,

Garner²,

Bourlard³

2017

Interspeech 2017

View full text Add to dashboard Cite

Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of language-specific senones or one single universal IPA-based multilingual senone set. Both architectures are investigated, exploiting and comparing different language adaptive training (LAT) techniques originating from successful DNN-based speaker-adaptation. More specifically, speaker adaptive training methods such as Cluster Adaptive Training (CAT) and Learning Hidden Unit Contribution (LHUC) are considered. In addition, a language adaptive output architecture for IPA-based universal DNN is also studied and tested.Experiments show that LAT improves the performance and adaptation on the top layer further improves the accuracy. By combining state-level minimum Bayes risk (sMBR) sequence training with LAT, we show that a language adaptively trained IPA-based universal DNN outperforms a monolingually sequence trained model.

show abstract

Multilingual deep neural network based acoustic modeling for rapid language adaptation

Cited by 113 publications

References 17 publications

Code-switching detection using multilingual DNNS

Code-switching detection using multilingual DNNS

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Contact Info

Product

Resources

About