On Addressing Practical Challenges for RNN-Transducer

Zhao, Rui; Xue, Jian; Li, Jinyu; Wei, Wenning; Gong, Yifan

doi:10.1109/asru51503.2021.9688101

Cited by 13 publications

(1 citation statement)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To effectively adapt large-scale pre-trained English ASR models to different language recognition (e.g., English to French), previous research efforts [30] suggest a solution by replacing the last prediction layer of ASR. However, we investigate that deploying "multilingual graphemes" is even more effective than replacing the final prediction head directly.…”

Section: English Graphemes Pre-training For Multilingual Datamentioning

confidence: 99%

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Yang¹,

Li²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can re-purpose well-trained English automatic speech recognition (ASR) models to recognize the other languages. We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement that, for the first time, empowers model reprogramming on ASR. Specifically, we investigate how to select trainable components (i.e., encoder) of a conformer-based RNN-Transducer, as a frozen pre-trained backbone. Experiments on a seven-language multilingual LibriSpeech speech (MLS) task show that model reprogramming only requires 4.2% (11M out of 270M) to 6.8% (45M out of 660M) of its original trainable parameters from a full ASR model to perform competitive results in a range of 11.9% to 8.1% WER averaged across different languages. In addition, we discover different setups to make large-scale pre-trained ASR succeed in both monolingual and multilingual speech recognition. Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses (e.g., w2v-bert) in terms of lower WER and better training efficiency.

show abstract

Section: English Graphemes Pre-training For Multilingual Datamentioning

confidence: 99%