ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746594
|View full text |Cite
|
Sign up to set email alerts
|

Massively Multilingual ASR: A Lifelong Learning Solution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…Although WPMs are more commonly adopted over graphemes for monolingual ASR, an output layer with multilingual WPMs generated by pooling all monolingual data together can often be overly large when many languages and writing scripts are integrated [28,29]. Separate monolingual output layers can be used as a solution, which can be dated back to the previous works with phonemes and graphemes [30,31].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although WPMs are more commonly adopted over graphemes for monolingual ASR, an output layer with multilingual WPMs generated by pooling all monolingual data together can often be overly large when many languages and writing scripts are integrated [28,29]. Separate monolingual output layers can be used as a solution, which can be dated back to the previous works with phonemes and graphemes [30,31].…”
Section: Related Workmentioning
confidence: 99%
“…• First, the UML allows multilingual ASR to scale gracefully to any number of languages without increasing the output layer size [28,29]. This is smaller in size than the conventional multilingual output layer and improves the computation efficiency in both RNN-T training and decoding.…”
Section: A Universal Monolingual Output Layermentioning
confidence: 99%
“…As no label information is needed, this approach can easily scale up for more diverse speech data without human transcription effort involved. With supervised multitask learning similar to [13], different tasks are unified into a heterogeneous discriminative task and the model is trained jointly on these tasks, such as multi-domain tasks [5,18] or multilingual tasks [19,20]. A prerequisite of this approach is to have some labeled data for tasks that the FMs are trained on.…”
Section: Introductionmentioning
confidence: 99%
“…Recent advances [1,2,3,4,5,6] in developing large-scale ASR architectures have demonstrated promising results for English speech recognition tasks. Moreover, English ASR model with self-supervised training objectives, such as wav2vec2 [7], w2v-BERT [8], and BigSSL [9], further boosts recognition performance, as an extension from the existing supervised ASR framework with annotated data.…”
Section: Introductionmentioning
confidence: 99%