2020
DOI: 10.48550/arxiv.2007.03900
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

Abstract: Multilingual ASR technology simplifies model training and deployment, but its accuracy is known to depend on the availability of language information at runtime. Since language identity is seldom known beforehand in real-world scenarios, it must be inferred on-the-fly with minimum latency. Furthermore, in voice-activated smart assistant systems, language identity is also required for downstream processing of ASR output. In this paper, we introduce streaming, end-to-end, bilingual systems that perform both ASR … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 28 publications
0
12
0
Order By: Relevance
“…This could be the reason why the confusion rates are similar. To further reduce the confusion rate, we may consider various language identification approaches [1], [2], [18]. We calculate the average length of hypotheses from the bilingual experiments.…”
Section: Language Confusionmentioning
confidence: 99%
“…This could be the reason why the confusion rates are similar. To further reduce the confusion rate, we may consider various language identification approaches [1], [2], [18]. We calculate the average length of hypotheses from the bilingual experiments.…”
Section: Language Confusionmentioning
confidence: 99%
“…The work developed in [15] develops an LID model that uses acoustic and text embeddings to choose the correct ASR model in 4 different languages. The work in [16] uses pre-trained LID embeddings to choose between ASR models in English-Spanish and English-Hindi pairs.…”
Section: Related Workmentioning
confidence: 99%
“…To remove the dependency on knowing one-hot LID in advance, one way is to estimate the LID and use it as the additional input to E2E multilingual models. However, the gain is very limited, especially for streaming E2E models, because the estimation is not very reliable [138,139]. Another solution is to build a corresponding multilingual E2E model for any set of language combinations.…”
Section: Multilingual Modelingmentioning
confidence: 99%