2021
DOI: 10.1007/978-3-030-87802-3_40
|View full text |Cite
|
Sign up to set email alerts
|

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…To conduct training and testing of neural networks, datasets from the Institute of Smart Systems and Artificial Intelligence (ISSAI) of Nazarbayev University were used, namely the Kazakh speech corpus [22], Russian speech corpus [23], Turkish language corpus [24], and Uzbek language corpus [25]. One of the largest open datasets, the Common Voice Dataset [26], was also used, namely, the corpus of the Kyrgyz language, the corpus of the English language, and the corpus of the French language.…”
Section: Materials and Methods Of Researchmentioning
confidence: 99%
“…To conduct training and testing of neural networks, datasets from the Institute of Smart Systems and Artificial Intelligence (ISSAI) of Nazarbayev University were used, namely the Kazakh speech corpus [22], Russian speech corpus [23], Turkish language corpus [24], and Uzbek language corpus [25]. One of the largest open datasets, the Common Voice Dataset [26], was also used, namely, the corpus of the Kyrgyz language, the corpus of the English language, and the corpus of the French language.…”
Section: Materials and Methods Of Researchmentioning
confidence: 99%
“…И здесь очень важным становится вопрос корректности материала, на котором обучается медицинский GPT. В этом отношении есть хороший доказательный пример разработки корпуса для обучения модели [21,22].…”
Section: методология применения Gpt-4: точка зрения инженеров-разрабо...unclassified
“…Several datasets and resources have been introduced to address the challenges posed by low-resource languages. Examples include the Kazakh Speech Corpus (KSC) [21] and its updated version [22], the THUYG-20 database for Uyghur [23], the Uzbek Speech Corpus (USC) [24], and the Turkish Speech Corpus (TSC) [10]. Kazakh, Kyrgyz, and Uyghur are also present in larger multilingual corpora (e.g., M2ASR [25]).…”
Section: B Asr For Low-resource Languagesmentioning
confidence: 99%
“…We used several data sources to compile the training corpora, including Common Voice (CVC) [33] version 13.0, KSC2 [22], TSC [10], USC [24], and FLEURS [34]. Among these, CVC stands out as one of the largest publicly available multilingual datasets that encompasses a wide variety of accents, demographics, and recording environments.…”
Section: A Datasetsmentioning
confidence: 99%