Automatic Speech Recognition for Uyghur, Kazakh, and Kyrgyz: An Overview

Du, Wenqiang; Maimaitiyiming, Yikeremu; Nijat, Mewlude; Li, Lantian; Hamdulla, Askar; Wang, Dong

doi:10.3390/app13010326

Cited by 8 publications

(4 citation statements)

References 150 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The various nuances and complexity in the grammar formation and vocabularies of these divergent languages however affect the output of these E2E low-resource ASR models such that [18] submits that the most advanced ASR models are constrained when it comes to low-resourced languages. [19] identified limited availability of speech and text data, absence of standardization with variations in pronunciation as well as the unique properties each language possesses as shown in their linguistic and phonetic composition as the three primary challenges ASR for low-resourced languages face. To produce accurate transcripts from speech patterns identified in a specific input language speech, ASR models require a substantial amount of training data.…”

Section: A Asrmentioning

confidence: 99%

Towards Yoruba-Speaking Google Maps Navigation

Oyesanmi,

Olukanmi

2024

Preprint

View full text Add to dashboard Cite

Advances in natural language processing (NLP) have made several technological interventions and services available to people in different languages. One such service is the Google Maps direction narration which provides real-time oral assistance to tourists, and visitors in a new or unknown location. Like most related assistive technologies, this service is primarily developed in the English language with support for some other Western languages over time, and the African languages are largely neglected. This paper seeks to leverage advances in NLP techniques and models in the design of a speech-to-speech (STS) translation of the Google Maps direction narration in English to the Yoruba language, one of the most widely spoken languages in Western Africa. We begin with an exploration of various state-of-the-art NLP techniques for Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-speech (TTS) models that make up the designed system. We presented the performance of the models we explored towards the design and implementation of a robust STS translation of the Google Maps direction narration in the Yoruba language.

show abstract

Section: A Asrmentioning

confidence: 99%

Towards Yoruba-Speaking Google Maps Navigation

Oyesanmi,

Olukanmi

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…A number of models [ 38 , 39 , 40 , 41 , 42 , 43 , 44 ] have been created for the speech recognition of the Kazakh language. The complexity with regard to Kazakh, its distinctive features, the scarcity of emotional speech datasets, and other factors make it difficult to develop a model for emotional speech detection in this language.…”

Section: Related Workmentioning

confidence: 99%

Continuous Sign Language Recognition and Its Translation into Intonation-Colored Speech

Amangeldy

Ukenova

Bekmanova

et al. 2023

Sensors

View full text Add to dashboard Cite

This article is devoted to solving the problem of converting sign language into a consistent text with intonation markup for subsequent voice synthesis of sign phrases by speech with intonation. The paper proposes an improved method of continuous recognition of sign language, the results of which are transmitted to a natural language processor based on analyzers of morphology, syntax, and semantics of the Kazakh language, including morphological inflection and the construction of an intonation model of simple sentences. This approach has significant practical and social significance, as it can lead to the development of technologies that will help people with disabilities to communicate and improve their quality of life. As a result of the cross-validation of the model, we obtained an average test accuracy of 0.97 and an average val_accuracy of 0.90 for model evaluation. We also identified 20 sentence structures of the Kazakh language with their intonational model.

show abstract

“…In the considered problem, training an E2E multi-task speech recognition model consists of the following parts: expanding the sequence of Kazakh characters, dialect characters, and speaker identification as output targets. In the training, we use speaker and dialect identifiers 30,31,32 .…”

Section: Multi-task Learning For Kazakh Languagementioning

confidence: 99%

Neurorecognition visualization in multitask end-to-end speech

Mamyrbayev,

Pavlov,

Bekarystankyzy

et al. 2023

Optical Fibers and Their Applications 2023

View full text Add to dashboard Cite

Nowadays, speech-processing technologies with different language systems are successfully used in mobile and stationary devices. Kazakh is considered a low-resource language, which poses various challenges for conventional speech recognition methods. This paper presents a proposed model capable of multitasking and handling concurrent speech recognition, dialect identification, and speaker identification, all in an end-to-end framework. The developed multitask model enables training three different tasks within a single model. A multitask recognition system is created based on the WaveNet-CTC model. Experiments show that for the concrete task end-to-end multitask model has better performance than other models.

show abstract

Automatic Speech Recognition for Uyghur, Kazakh, and Kyrgyz: An Overview

Cited by 8 publications

References 150 publications

Towards Yoruba-Speaking Google Maps Navigation

Towards Yoruba-Speaking Google Maps Navigation

Continuous Sign Language Recognition and Its Translation into Intonation-Colored Speech

Neurorecognition visualization in multitask end-to-end speech

Contact Info

Product

Resources

About