USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Musaev, Muhammadjon; Mussakhojayeva, Saida; Khujayorov, Ilyos; Khassanov, Yerbolat; Ochilov, Mannon; Varol, Hüseyin Atakan

doi:10.1007/978-3-030-87802-3_40

Cited by 18 publications

(8 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To conduct training and testing of neural networks, datasets from the Institute of Smart Systems and Artificial Intelligence (ISSAI) of Nazarbayev University were used, namely the Kazakh speech corpus [22], Russian speech corpus [23], Turkish language corpus [24], and Uzbek language corpus [25]. One of the largest open datasets, the Common Voice Dataset [26], was also used, namely, the corpus of the Kyrgyz language, the corpus of the English language, and the corpus of the French language.…”

Section: Materials and Methods Of Researchmentioning

confidence: 99%

The dependence of the effectiveness of neural networks for recognizing human voice on language

Nurlankyzy,

Akhmediyarova,

Zhetpisbayeva

et al. 2024

EEJET

View full text Add to dashboard Cite

This study examines the effectiveness of neural network architectures (multilayer perceptron MLP, convolutional neural network CNN, recurrent neural network RNN) for human voice recognition, with an emphasis on the Kazakh language. Problems related to language, the difference between speakers, and the influence of network architecture on recognition accuracy are considered. The methodology includes extensive training and testing, studying the accuracy of recognition in different languages, and different sets of data on speakers. Using a comparative analysis, this study evaluates the performance of three architectures trained exclusively in the Kazakh language. The testing included statements in Kazakhs and other languages, while the number of speakers varied to assess its impact on recognition accuracy. During the study, the results showed that CNN neural networks are more effective in recognizing human voice than RNN and MLP. Also, it was found that the CNN has a higher accuracy in recognizing the human voice in the Kazakh language, both for a small and for a large number of announcers. For example, for 20 speakers, the recognition error in Russian was 21.86 %, whereas in Kazakhs it was 10.6 %. A similar trend was observed for 80 speakers: 16.2 % Russians and 8.3 % Kazakhs. It can also be argued that learning one language does not guarantee high recognition accuracy in other languages. Therefore, the accuracy of human voice recognition by neural networks depends significantly on the language in which training is conducted. In addition, this study highlights the importance of different sets of speaker data to achieve optimal results. This knowledge is crucial for advancing the development of reliable human voice recognition systems that can accurately identify different human voices in different language contexts

show abstract

Section: Materials and Methods Of Researchmentioning

confidence: 99%

The dependence of the effectiveness of neural networks for recognizing human voice on language

Nurlankyzy,

Akhmediyarova,

Zhetpisbayeva

et al. 2024

EEJET

View full text Add to dashboard Cite

show abstract

“…И здесь очень важным становится вопрос корректности материала, на котором обучается медицинский GPT. В этом отношении есть хороший доказательный пример разработки корпуса для обучения модели [21,22].…”

Section: методология применения Gpt-4: точка зрения инженеров-разрабо...unclassified

Нужен Ли Медикам GPT-4: Анализ Актуального Мирового Опыта

Адылова,

Давронов

2024

Международный журнал теоретических и прикладных вопросов цифров

View full text Add to dashboard Cite

Большие языковые модели (LLM) продемонстрировали замечательные возможности в понимании и генерации естественного языка в различных областях, включая медицину. В статье представлена оценка GPT-4 на основе двух точек зрения на проблему применения этой языковой модели: разработчиков из OpenAI, Microsoft и пользователей-медиков из двух европейских проектов. За последние несколько лет LLM, обученные на массивных междисциплинарных корпусах, стали мощными строительными блоками при создании систем, ориентированных на решение конкретных задач. В статье рассматривается три задачи: медицинское образование, работоспособность ChatGPT-4 в клинике (консультации, записи стенограмм беседы врача и пациента), и конкретные уровни точности диагностики (разные области медицины). Ответ на поставленный вопрос о необходимости медицинского GPT есть в мире, -он положительный.

show abstract

“…Several datasets and resources have been introduced to address the challenges posed by low-resource languages. Examples include the Kazakh Speech Corpus (KSC) [21] and its updated version [22], the THUYG-20 database for Uyghur [23], the Uzbek Speech Corpus (USC) [24], and the Turkish Speech Corpus (TSC) [10]. Kazakh, Kyrgyz, and Uyghur are also present in larger multilingual corpora (e.g., M2ASR [25]).…”

Section: B Asr For Low-resource Languagesmentioning

confidence: 99%

“…We used several data sources to compile the training corpora, including Common Voice (CVC) [33] version 13.0, KSC2 [22], TSC [10], USC [24], and FLEURS [34]. Among these, CVC stands out as one of the largest publicly available multilingual datasets that encompasses a wide variety of accents, demographics, and recording environments.…”

Section: A Datasetsmentioning

confidence: 99%

Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus

Mussakhojayeva,

Gilmullin,

Khakimov

et al. 2024

Preprint

View full text Add to dashboard Cite

After focusing on individual languages for a long time, multilingual automatic speech recognition has recently become an active area of research. For instance, Whisper by OpenAI is capable of recognizing speech in 99 languages. However, the performance of Whisper is significantly lower for lowresource languages than for high-resource ones. In this work, we aim to address this and present a fine-tuning strategy for the pretrained Whisper model so that its performance is improved for a low-resource language family while maintaining performance for a set of high-resource languages. Specifically, our Söyle model exhibited high performance for both the Turkic language family (11 languages) and the official languages of the United Nations. Our work also presents the first large open-source speech corpus for the Tatar language. We demonstrate that speech recognition performance for Tatar improves with the model trained using the new Tatar Speech Corpus (TatSC). Our model is also trained to be noise-robust. We open-source our model and TatSC to encourage further research. We envision that our fine-tuning approach will guide the creation multilingual speech recognition models for other low-resource language families.

show abstract

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Cited by 18 publications

References 24 publications

The dependence of the effectiveness of neural networks for recognizing human voice on language

The dependence of the effectiveness of neural networks for recognizing human voice on language

Нужен Ли Медикам GPT-4: Анализ Актуального Мирового Опыта

Noise-Robust Multilingual Speech Recognition and the Tatar Speech Corpus

Contact Info

Product

Resources

About