JDHASA 2021
DOI: 10.55492/dhasa.v3i03.3820
|View full text |Cite
|
Sign up to set email alerts
|

Investigating the feasibility of harvesting broadcast speech data to develop resources for South African languages

Abstract: Sufficient target language data remains an important factor in the development of automatic speech recognition (ASR) systems. For instance, the substantial improvement in acoustic modelling that deep architectures have recently achieved for well-resourced languages requires vast amounts of speech data. Moreover, the acoustic models in state-of-the-art ASR systems that generalise well across different domains are usually trained on various corpora, not just one or two. Diverse corpora containing hundreds of hou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…In Badenhorst & de Wet (2021) it was observed that acoustic match to the harvest data is an important factor for transcription accuracy. The four baseline sub-word systems that were evaluated in this study confirmed this observation.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…In Badenhorst & de Wet (2021) it was observed that acoustic match to the harvest data is an important factor for transcription accuracy. The four baseline sub-word systems that were evaluated in this study confirmed this observation.…”
Section: Discussionmentioning
confidence: 99%
“…These indoor speeches were similar to radio A relatively large Afr News test set could be selected from an existing Afr corpus (De Wet et al 2011). In addition, the Afr Messages data, introduced in Badenhorst & de Wet (2021), was included as an example of Afr studio speeches.…”
Section: Test Datamentioning
confidence: 99%
See 2 more Smart Citations
“…The feasibility of harvesting radio broadcast speech data for the development of ASR systems for South African languages was first evaluated in Badenhorst & de Wet (2021). In this work, a semiautomatic data harvesting procedure was proposed.…”
Section: Previous Workmentioning
confidence: 99%