Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

Chen, Dongpeng; Mak, Brian; Leung, Cheung-Chi; Sivadas, Sunil

doi:10.1109/icassp.2014.6854673

Cited by 61 publications

(35 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It aligns a sequence of tokens automatically. As in traditional systems, phones, graphemes or both combined can be used as acoustic modeling units [13]. Given enough training data, even whole words can be used [14].…”

Section: Rnn Based Asr Systemsmentioning

confidence: 99%

Multilingual Adaptation of RNN Based ASR Systems

Miiller

Stiiker

Waibel

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call "modulation". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the modulation.

show abstract

Section: Rnn Based Asr Systemsmentioning

confidence: 99%

Multilingual Adaptation of RNN Based ASR Systems

Miiller

Stiiker

Waibel

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…When the multiple tasks are related but not identical, or (in the ideal case) complementary to each other, MTL models offer better generalization from training to test corpus [9]. A number of works [9,10,11] have proved MTL to be effective on speech processing tasks. Among them [11] proved MTL effective at improving model performance for under-resourced ASR.…”

Section: Multi-task Learningmentioning

confidence: 99%

“…A number of works [9,10,11] have proved MTL to be effective on speech processing tasks. Among them [11] proved MTL effective at improving model performance for under-resourced ASR.…”

Section: Multi-task Learningmentioning

confidence: 99%

See 1 more Smart Citation

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks

Lim²,

Yang

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental. Acoustic landmarks are perceptually salient, even in a language one doesn't speak, and it has been demonstrated that non-speakers of the language can identify features such as the primary articulator of the landmark. These factors suggest a strategy for developing language-independent automatic speech recognition: landmarks can potentially be learned once from a suitably labeled corpus and rapidly applied to many other languages. This paper proposes enhancing the cross-lingual portability of a neural network by using landmarks as the secondary task in multi-task learning (MTL). The network is trained in a well-resourced source language with both phone and landmark labels (English), then adapted to an under-resourced target language with only word labels (Iban). Landmark-tasked MTL reduces source-language phone error rate by 2.9% relative, and reduces target-language word error rate by 1.9%-5.9% depending on the amount of target-language training data. These results suggest that landmark-tasked MTL causes the DNN to learn hidden-node features that are useful for cross-lingual adaptation.

show abstract

“…No explicit modelling of context-dependent targets as in traditional systems is required. Phones, graphemes or both can be used as acoustic modeling units [15]. Training on whole words is also possible, given enough training data [16].…”

Section: Rnn Based Asr Systemsmentioning

confidence: 99%

Neural Language Codes for Multilingual Acoustic Models

2018

View full text Add to dashboard Cite

Multilingual Speech Recognition is one of the most costly AI problems, because each language (7,000+) and even different accents require their own acoustic models to obtain best recognition performance. Even though they all use the same phoneme symbols, each language and accent imposes its own coloring or "twang". Many adaptive approaches have been proposed, but they require further training, additional data and generally are inferior to monolingually trained models. In this paper, we propose a different approach that uses a large multilingual model that is modulated by the codes generated by an ancillary network that learns to code useful differences between the "twangs" or human language.We use Meta-Pi networks [1, 2] to have one network (the language code net) gate the activity of neurons in another (the acoustic model nets). Our results show that during recognition multilingual Meta-Pi networks quickly adapt to the proper language coloring without retraining or new data, and perform better than monolingually trained networks. The model was evaluated by training acoustic modeling nets and modulating language code nets jointly and optimize them for best recognition performance.

show abstract

Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

Cited by 61 publications

References 17 publications

Multilingual Adaptation of RNN Based ASR Systems

Multilingual Adaptation of RNN Based ASR Systems

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks

Neural Language Codes for Multilingual Acoustic Models

Contact Info

Product

Resources

About