2022
DOI: 10.3390/app13010326
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Recognition for Uyghur, Kazakh, and Kyrgyz: An Overview

Abstract: With the emergence of deep learning, the performance of automatic speech recognition (ASR) systems has remarkably improved. Especially for resource-rich languages such as English and Chinese, commercial usage has been made feasible in a wide range of applications. However, most languages are low-resource languages, presenting three main difficulties for the development of ASR systems: (1) the scarcity of the data; (2) the uncertainty in the writing and pronunciation; (3) the individuality of each language. Uyg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 150 publications
0
1
0
Order By: Relevance
“…The various nuances and complexity in the grammar formation and vocabularies of these divergent languages however affect the output of these E2E low-resource ASR models such that [18] submits that the most advanced ASR models are constrained when it comes to low-resourced languages. [19] identified limited availability of speech and text data, absence of standardization with variations in pronunciation as well as the unique properties each language possesses as shown in their linguistic and phonetic composition as the three primary challenges ASR for low-resourced languages face. To produce accurate transcripts from speech patterns identified in a specific input language speech, ASR models require a substantial amount of training data.…”
Section: A Asrmentioning
confidence: 99%
“…The various nuances and complexity in the grammar formation and vocabularies of these divergent languages however affect the output of these E2E low-resource ASR models such that [18] submits that the most advanced ASR models are constrained when it comes to low-resourced languages. [19] identified limited availability of speech and text data, absence of standardization with variations in pronunciation as well as the unique properties each language possesses as shown in their linguistic and phonetic composition as the three primary challenges ASR for low-resourced languages face. To produce accurate transcripts from speech patterns identified in a specific input language speech, ASR models require a substantial amount of training data.…”
Section: A Asrmentioning
confidence: 99%
“…A number of models [ 38 , 39 , 40 , 41 , 42 , 43 , 44 ] have been created for the speech recognition of the Kazakh language. The complexity with regard to Kazakh, its distinctive features, the scarcity of emotional speech datasets, and other factors make it difficult to develop a model for emotional speech detection in this language.…”
Section: Related Workmentioning
confidence: 99%
“…In the considered problem, training an E2E multi-task speech recognition model consists of the following parts: expanding the sequence of Kazakh characters, dialect characters, and speaker identification as output targets. In the training, we use speaker and dialect identifiers 30,31,32 .…”
Section: Multi-task Learning For Kazakh Languagementioning
confidence: 99%