2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461614
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual Adaptation of RNN Based ASR Systems

Abstract: In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 18 publications
(16 reference statements)
1
14
0
Order By: Relevance
“…Although the universal multilingual modelling enjoys richer data resources, the mixture of data creates more variation, especially for those identical IPA symbols shared among different languages. The result is consistent with another recent independent study [29]. This motivates us to apply language adaptive training in the multilingual CTC model.…”
Section: Multilingual Trainingsupporting
confidence: 91%
“…Although the universal multilingual modelling enjoys richer data resources, the mixture of data creates more variation, especially for those identical IPA symbols shared among different languages. The result is consistent with another recent independent study [29]. This motivates us to apply language adaptive training in the multilingual CTC model.…”
Section: Multilingual Trainingsupporting
confidence: 91%
“…Thus, speaker features are typically added at the acoustic feature level. For language adaptation, we showed that adding LFVs deeper into the network [3] does improve the performance, and also optimized this approach [18]. Key is a method called modulation, which is based on Meta-PI networks [1,2].…”
Section: Lid Figure 1: Language Feature Vectors (Lfvs) Network Architmentioning
confidence: 99%
“…In this work, we will show how acoustic models can be adapted to languages in a multilingual setting. We have proposed adaptation methods based on language features [3,4] and now propose a new approach that extends our previous work. We will incorporate two novel aspects: a) the use of adaptive neural language codes (NLCs), which are based on language feature vectors (LFVs) [4], but can be adapted during acoustic model training and b) a network superstructure based on Meta-PI [1,2], which allows to use pre-trained monolingual subnets in a multilingual system.…”
Section: Introductionmentioning
confidence: 97%
“…One possible approach to this problem is to utilize data of other languages. There are various approaches to leverage other languages: (a) to train a model multilingually (multi-task learning with other languages), and then further fine-tune to a particular language [6], and (b) to adapt a multilingual model to a new language using transfer learning [6][7][8][9] and additional features obtained from the multilingual model such as multilingual bottleneck features (BNF) [10][11][12][13] and language feature vectors (LFV) [14] (cross-lingual adaptation). To obtain a multilingual S2S model, a part of parameters can be shared while preparing the output layers per language [6], and we can further use a unified architecture with a shared vocabulary among multiple languages [15][16][17].…”
Section: Introductionmentioning
confidence: 99%