A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Khassanov, Yerbolat; Mussakhojayeva, Saida; Mirzakhmetov, Almas; Adiyev, Alen; Nurpeiissov, Mukhamet; Varol, Hüseyin Atakan

doi:10.18653/v1/2021.eacl-main.58

Cited by 26 publications

(13 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [17], a 335 h corpus for the Kazakh language was presented. As a result of the experiment, they showed that a sufficiently large training data set significantly improves the performance of a speech recognition system based on an end-to-end model compared to hybrid ones.…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Mamyrbayev

Alimhan

Оралбекова

et al. 2022

EEJET

View full text Add to dashboard Cite

Ensuring the best quality and performance of modern speech technologies, today, is possible based on the widespread use of machine learning methods. The idea of this project is to study and implement an end-to-end system of automatic speech recognition using machine learning methods, as well as to develop new mathematical models and algorithms for solving the problem of automatic speech recognition for agglutinative (Turkic) languages. Many research papers have shown that deep learning methods make it easier to train automatic speech recognition systems that use an end-to-end approach. This method can also train an automatic speech recognition system directly, that is, without manual work with raw signals. Despite the good recognition quality, this model has some drawbacks. These disadvantages are based on the need for a large amount of data for training. This is a serious problem for low-data languages, especially Turkic languages such as Kazakh and Azerbaijani. To solve this problem, various methods are needed to apply. Some methods are used for end-to-end speech recognition of languages belonging to the group of languages of the same family (agglutinative languages). Method for low-resource languages is transfer learning, and for large resources – multi-task learning. To increase efficiency and quickly solve the problem associated with a limited resource, transfer learning was used for the end-to-end model. The transfer learning method helped to fit a model trained on the Kazakh dataset to the Azerbaijani dataset. Thereby, two language corpora were trained simultaneously. Conducted experiments with two corpora show that transfer learning can reduce the symbol error rate, phoneme error rate (PER), by 14.23 % compared to baseline models (DNN+HMM, WaveNet, and CNC+LM). Therefore, the realized model with the transfer method can be used to recognize other low-resource languages.

show abstract

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Mamyrbayev

Alimhan

Оралбекова

et al. 2022

EEJET

View full text Add to dashboard Cite

show abstract

“…Among the aforementioned three languages, Russian and English are considered resource-rich, i.e., a large number of annotated datasets exist [2,6,31] and extensive studies have been conducted, both in monolingual and multilingual settings [4,25,28]. On the other hand, Kazakh is considered a low-resource language, where annotated datasets and speech processing research have emerged only in recent years [19,26]. The authors of [19] presented the first crowdsourced open-source Kazakh speech corpus and conducted initial Kazakh speech recognition experiments on both DNN-HMM and E2E architectures.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, Kazakh is considered a low-resource language, where annotated datasets and speech processing research have emerged only in recent years [19,26]. The authors of [19] presented the first crowdsourced open-source Kazakh speech corpus and conducted initial Kazakh speech recognition experiments on both DNN-HMM and E2E architectures. Similarly, the authors of [26] presented the first publicly available speech synthesis dataset for Kazakh.…”

Section: Related Workmentioning

confidence: 99%

“…For Kazakh, we used the recently presented open-source Kazakh Speech Corpus (KSC) [19]. The KSC contains around 332 hours of transcribed audio crowdsourced through the Internet, where volunteers from different regions and age groups were asked to read sentences presented through a web browser.…”

Section: The Kazakh Languagementioning

confidence: 99%

“…In the KSC, all texts are represented using the Cyrillic alphabet, and audio recordings are stored in the WAV format. For the training, validation, and test sets, we used the standard split of non-overlapping speakers provided in [19].…”

Section: The Kazakh Languagementioning

confidence: 99%

See 2 more Smart Citations

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Mussakhojayeva

Khassanov

Varol

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and independent. Furthermore, we evaluate the impact of LMs and data augmentation techniques on the recognition performance of the multilingual E2E ASR. In addition, we present several datasets for training and evaluation purposes. Experiment results show that the multilingual models achieve comparable performances to the monolingual baselines with a similar number of parameters. Our best monolingual and multilingual models achieved 20.9% and 20.5% average word error rates on the combined test set, respectively. To ensure the reproducibility of our experiments and results, we share our training recipes, datasets, and pre-trained models 1 .

show abstract

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Musaev

Mussakhojayeva

Khujayorov

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Cited by 26 publications

References 21 publications

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Contact Info

Product

Resources

About