Self-Supervised End-to-End ASR for Low Resource L2 Swedish

Al-Ghezi, Ragheb; Getman, Yaroslav; Rouhe, Aku; Hildén, Raili; Kurimo, Mikko

doi:10.21437/interspeech.2021-1710

Cited by 7 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike Finnish, Swedish has its own monolingual wav2vec2.0 model [26]. Because the preliminary experiments [6] indicated that the monolingual model work better for the target language than the multilingual one, we adopted it as our baseline. We then fine-tuned it directly with the SweSchool portion of the DigiTala data (the three folds selected for training) as in [18].…”

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

“…There has not been enough transcribed training data for ASR development and rated speech data for ASA training. However, there have recently been many successful attempts to apply self-supervised deep acoustic transformer models like wav2vec2.0 [5] to low-resource domains including systems for ASR and various audio classification tasks [6,7,8,9]. Inspired by the potential of the latest technology and the significance ASA may have for society, we have recently collected and annotated a significant amount of Finnish and Finland Swedish L2 learners' speech data in the DigiTala project.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

New data, benchmark and baseline for L2 speaking assessment for low-resoure languages

Kurimo,

Getman,

Voskoboinik

et al. 2023

9th Workshop on Speech and Language Technology in Education (SLaTE)

View full text Add to dashboard Cite

The development of large multilingual speech models provides the possibility to construct high-quality speech technology even for low-resource languages. In this paper, we present the speech data of L2 learners of Finnish and Finland Swedish that we have recently collected for training and evaluation of automatic speech recognition (ASR) and speaking assessment (ASA). It includes over 4000 recordings by over 300 students per language in short read-aloud and free-form tasks. The recordings have been manually transcribed and assessed for pronunciation, fluency, range, accuracy, task achievement, and a holistic proficiency level. We present also an ASR and ASA benchmarking setup we have constructed using this data and include results from our baseline systems built by fine-tuning a self-supervised multilingual model for the target language. In addition to benchmarking, our baseline system can be used by L2 students and teachers for online self-training and evaluation of oral proficiency.

show abstract

Section: The Proposed Benchmark and Baselinementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

New data, benchmark and baseline for L2 speaking assessment for low-resoure languages

Kurimo,

Getman,

Voskoboinik

et al. 2023

9th Workshop on Speech and Language Technology in Education (SLaTE)

View full text Add to dashboard Cite

show abstract

“…This study investigates content relevancy scoring using two corpora of non-native spontaneous speech: Finnish and Finland Swedish (Al-Ghezi et al, 2021. The Swedish data was collected from upper secondary school students, while the Finnish data contains responses from both upper secondary school students and university students.…”

Section: Datamentioning

confidence: 99%

Automated Assessment of Task Completion in Spontaneous Speech for Finnish and Finland Swedish Language Learners

Voskoboinik¹,

Getman²,

Al-Ghezi³

et al. 2023

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

This study investigates the feasibility of automated content scoring for spontaneous spoken responses from Finnish and Finland Swedish learners. Our experiments reveal that pretrained Transformer-based models outperform the tf-idf baseline in automatic task completion grading. Furthermore, we demonstrate that pre-fine-tuning these models to differentiate between responses to distinct prompts enhances subsequent task completion finetuning. We observe that task completion classifiers exhibit accelerated learning and produce predictions with stronger correlations to human grading when accounting for task differences. Additionally, we find that employing similarity learning, as opposed to conventional classification fine-tuning, further improves the results. It is especially helpful to learn not just the similarities between the responses in one score bin, but the exact differences between the average human scores responses received. Lastly, we demonstrate that models applied to both manual and ASR transcripts yield comparable correlations to human grading.

show abstract