Efficient Weight factorization for Multilingual Speech Recognition

Pham, Ngoc-Quan; Nguyen, Tuan-Nam; Stueker, Sebastian; Waibel, Alexander

doi:10.48550/arxiv.2105.03010

Cited by 1 publication

(6 citation statements)

References 35 publications

(64 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also provided further analysis to the architecture, by showing that there are benefits to either improving the self-attention mechanism by adding relative positions during fine-tuning, or stacking the MBART encoders to the wav2vec counterpart. This is the continuation of the line of work building multilingual ASR systems based on sequence-to-sequence neural networks [17,5,16]. Our work is available for public at https://github.com/quanpn90/NMTGMinor providing a highly CUDA-optimized implementation for both wav2vec and MBART which is potentially useful for the community.…”

Section: Introductionmentioning

confidence: 87%

“…The motivation comes from the disbelief that there are features being shared between languages and at the same time each language requires to selectively represented, and networks are encouraged to change "modes" depending on the language being processed [23]. Since then, multingual model designers opt to use specific network components being presented for each language, ranging from weight generator [24] to adapters [15,22] or recently adaptive weights adding scales and biases to each weight matrix in the whole architecture [16]. In this paper, the last two options are selected for investigation thanks to being computationally manageable.…”

Section: Language Adaptive Componentsmentioning

confidence: 99%

“…On the other hand, adaptive weights [16] was proposed based on the observation that neural networks evolve rapidly yet the core remains to be matrix multiplication. Therefore, it is possible to separate the weight matrix into a shared component WS and language dependent adaptive scale WML and bias WBL.…”

Section: Language Adaptive Componentsmentioning

confidence: 99%

“…More importantly, there are many improvements for the Transformers over the years that improve either modeling long range dependencies in self-attention [14,10] or adaptive components for multilingual models [15,16]. It is promising to also combine these techniques with the pretrained models for the best results.…”

Section: Introductionmentioning

confidence: 99%

“…• Second, the language specific modulation techniques such as language adapters [15] and factorized adaptive weights [16] complement the two pretrained modules very well and have a strong impact on especially the lowresource languages mentioned above.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Adaptive multilingual speech recognition with pretrained models

Pham¹,

Waibel²,

Niehues³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretrained models for two modalities: wav2vec 2.0 for audio and MBART50 for text, together with the adaptive weight techniques to massively improve the recognition quality on the public datasets containing CommonVoice and Europarl. Overall, we noticed an 44% improvement over purely supervised learning, and more importantly, each technique provides a different reinforcement in different languages. We also explore other possibilities to potentially obtain the best model by slightly adding either depth or relative attention to the architecture.

show abstract

Section: Introductionmentioning

confidence: 87%