Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1710
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised End-to-End ASR for Low Resource L2 Swedish

Abstract: Unlike traditional (hybrid) Automatic Speech Recognition (ASR), end-to-end ASR systems simplify the training procedure by directly mapping acoustic features to sequences of graphemes or characters, thereby eliminating the need for specialized acoustic, language, or pronunciation models. However, one drawback of end-to-end ASR systems is that they require more training data than conventional ASR systems to achieve similar word error rate (WER). This makes it difficult to develop ASR systems for tasks where tran… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
0
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 18 publications
0
0
0
Order By: Relevance
“…Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26]. Because the preliminary experiments [6] indicated that the monolingual model work better for the target language than the multilingual one, we adopted it as our baseline. We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18].…”
Section: The Proposed Benchmark and Baselinementioning
confidence: 99%
See 1 more Smart Citation
“…Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26]. Because the preliminary experiments [6] indicated that the monolingual model work better for the target language than the multilingual one, we adopted it as our baseline. We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18].…”
Section: The Proposed Benchmark and Baselinementioning
confidence: 99%
“…There has not been enough transcribed training data for ASR development and rated speech data for ASA training. However, there have recently been many successful attempts to apply self-supervised deep acoustic transformer models like wav2vec2.0 [5] to low-resource domains including systems for ASR and various audio classification tasks [6,7,8,9]. Inspired by the potential of the latest technology and the significance ASA may have for society, we have recently collected and annotated a significant amount of Finnish and Finland Swedish L2 learners' speech data in the DigiTala project.…”
Section: Introductionmentioning
confidence: 99%
“…This study investigates content relevancy scoring using two corpora of non-native spontaneous speech: Finnish and Finland Swedish (Al-Ghezi et al, 2021. The Swedish data was collected from upper secondary school students, while the Finnish data contains responses from both upper secondary school students and university students.…”
Section: Datamentioning
confidence: 99%