2022
DOI: 10.48550/arxiv.2208.03067
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data and collected data for 15 languages, and trained experimental models using these techniques. Our results show that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 29 publications
(39 reference statements)
0
2
0
Order By: Relevance
“…As the MCV dataset evolved through multiple versions, there have been several studies and experimental results (Ritchie et al, 2022;Ravanelli et al, 2021;Kuchaiev et al, 2019) reported on the dataset. The best results are generally obtained by fine-tuning pre-trained models such as wav2.vec2.0 (Baevski et al, 2020) which are typically pre-trained on large English-only or multilingual speech data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As the MCV dataset evolved through multiple versions, there have been several studies and experimental results (Ritchie et al, 2022;Ravanelli et al, 2021;Kuchaiev et al, 2019) reported on the dataset. The best results are generally obtained by fine-tuning pre-trained models such as wav2.vec2.0 (Baevski et al, 2020) which are typically pre-trained on large English-only or multilingual speech data.…”
Section: Related Workmentioning
confidence: 99%
“…Recent advances in deep learning techniques for end-to-end speech recognition and the availability of open source frameworks and datasets allow us to empirically explore different ways to improve ASR performance for Kinyarwanda. While recent experimental reports and studies (Ravanelli et al, 2021;Ritchie et al, 2022) have shown improvement in ASR for Kinyarwanda, mostly via selfsupervised pre-training (Self-PT) representations such as wav2vec2.0 (Baevski et al, 2020), there haven't been exploration of using Kinyarwandaonly speech data for Self-PT pre-training and how to improve performance beyond using Self-PT representations. In this work, we report empirical experiments showing how ASR performance for Kinyarwanda can be improved though Self-PT pre-arXiv:2308.11863v1 [eess.AS] 23 Aug 2023 training on Kinyarwanda-only speech data, following a simple curriculum learning schedule during fine-tuning and using semi-supervised learning (Semi-SL) to leverage large unlabelled data.…”
Section: Introductionmentioning
confidence: 99%