ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413734
|View full text |Cite
|
Sign up to set email alerts
|

Joint ASR and Language Identification Using RNN-T: An Efficient Approach to Dynamic Language Switching

Abstract: Conventional dynamic language switching enables seamless multilingual interactions by running several monolingual ASR systems in parallel and triggering the appropriate downstream components using a standalone language identification (LID) service. Since this solution is neither scalable nor cost-and memory-efficient, especially for on-device applications, we propose end-to-end, streaming, joint ASR-LID architectures based on the recurrent neural network transducer framework. Two key formulations are explored:… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 21 publications
0
10
0
Order By: Relevance
“…x 1:t < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 2 g 5 G J 3 It is known that acoustic and linguistic information can be combined to improve LID prediction [9,12,28]. The concatenation of e enc1 r:t+r and e enc2 1:t allows the LID predictor to leverage such complementary information easily.…”
Section: Joining a Lid Predictor With Cascaded Encodersmentioning
confidence: 99%
See 3 more Smart Citations
“…x 1:t < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 2 g 5 G J 3 It is known that acoustic and linguistic information can be combined to improve LID prediction [9,12,28]. The concatenation of e enc1 r:t+r and e enc2 1:t allows the LID predictor to leverage such complementary information easily.…”
Section: Joining a Lid Predictor With Cascaded Encodersmentioning
confidence: 99%
“…Our work differs from this as we predict the LIDs instead of using the oracle ones. Another body of work looks at techniques to predict LID and use the predictions in the ASR system or downstream tasks [21][22][23][24][25][26][27][28][29]. Much of this work focuses on LID predictions in a non-streaming system [23][24][25][26][30][31][32], which does not fit into our streaming ASR setup that is important due to production constraints.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The resulting E2E model can perform utterance-based multilingual ASR. The works in [4] [5] [6] [7] aim to build an E2E model that can improve code switching. While these approaches are different from each other, there are some similarities among them.…”
Section: Introductionmentioning
confidence: 99%