Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-923
|View full text |Cite
|
Sign up to set email alerts
|

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…What should be the behavior of the monolingual Mandarin module p(Z M |X) when encountering a segment of English speech and vice versa? Monolingual modules in prior works [29][30][31] determine each label-to-frame alignment z M/E t by first determining the language identity of each speech frame LID(xt) [34]. If the speech frame xt is from a foreign language then the module will ignore it by emitting a special <NULL> token, otherwise it will transcribe using its monolingual vocabulary.…”
Section: Modeling P(z M/e |X) With Language Segmentationmentioning
confidence: 99%
See 2 more Smart Citations
“…What should be the behavior of the monolingual Mandarin module p(Z M |X) when encountering a segment of English speech and vice versa? Monolingual modules in prior works [29][30][31] determine each label-to-frame alignment z M/E t by first determining the language identity of each speech frame LID(xt) [34]. If the speech frame xt is from a foreign language then the module will ignore it by emitting a special <NULL> token, otherwise it will transcribe using its monolingual vocabulary.…”
Section: Modeling P(z M/e |X) With Language Segmentationmentioning
confidence: 99%
“…Finally, let us consider how to construct a neural architecture for our modified conditionally factorized framework. Monolingual and bilingual label-to-frame posteriors ( §2.1) may be modeled using CTC or RNN-T networks as demonstrated by prior works [29][30][31]. However for zero-shot CS ASR, the conditional independence assumption of CTC vs. the internal language modeling of RNN-T is a critical difference.…”
Section: Conditional Ctc With External Lm Architecturementioning
confidence: 99%
See 1 more Smart Citation