2022
DOI: 10.48550/arxiv.2203.16822
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Abstract: Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust endto-end (E2E) acoustic models (AM) that can be later finetuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases (i.e., domain shift). We target this scenario by analyzing the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 26 publications
0
1
0
Order By: Relevance
“…In [15][16][17], the authors aimed to improve the recognition of the callsigns in ASR by integrating surveillance data. Finally, the authors of [18] investigated the effect of fine-tuning large pre-trained models, trained using a Transformer architecture, for application in the ATC domain.…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%
“…In [15][16][17], the authors aimed to improve the recognition of the callsigns in ASR by integrating surveillance data. Finally, the authors of [18] investigated the effect of fine-tuning large pre-trained models, trained using a Transformer architecture, for application in the ATC domain.…”
Section: Automatic Speech Recognition (Asr)mentioning
confidence: 99%