2019
DOI: 10.1007/978-3-030-26061-3_12
|View full text |Cite
|
Sign up to set email alerts
|

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

Abstract: We present RUSLAN -a new open Russian spoken language corpus for the text-to-speech task. RUSLAN contains 22200 audio samples with text annotations -more than 31 hours of high-quality speech of one person -being the largest annotated Russian corpus in terms of speech duration for a single speaker. We trained an end-to-end neural network for the text-to-speech task on our corpus and evaluated the quality of the synthesized speech using Mean Opinion Score test. Synthesized speech achieves 4.05 score for naturaln… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…These corpora prioritise the measurement of dataset quality across various dimensions. Regarding signal quality, the Signal-to-Noise Ratio (SNR) holds significant importance, both during content filtering [36,37] and data recording stages [38][39][40][41]. Linguistic considerations also come into play, with some researchers emphasising the need for balanced phonemic or supraphonemic units within the dataset [38,39,41,42].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These corpora prioritise the measurement of dataset quality across various dimensions. Regarding signal quality, the Signal-to-Noise Ratio (SNR) holds significant importance, both during content filtering [36,37] and data recording stages [38][39][40][41]. Linguistic considerations also come into play, with some researchers emphasising the need for balanced phonemic or supraphonemic units within the dataset [38,39,41,42].…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, text preprocessing techniques are employed to ensure accurate alignment with the uttered speech and reduce variability in pronunciations [36,[39][40][41][42][43][44]. Lastly, the quantity of audio data generated by each speaker is a critical aspect in corpus creation, particularly in datasets with a low number of speakers [36,[38][39][40][41][42][43][44].…”
Section: Related Workmentioning
confidence: 99%
“…These corpora prioritise the measurement of dataset quality across various dimensions. Regarding signal quality, the SNR holds significant importance, both during the content filtering [36,37] and data recording stages [38][39][40][41]. Linguistic considerations also come into play, with some researchers emphasising the need for balanced phonemic or supraphonemic units within the dataset [38,39,41,42].…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, text preprocessing techniques are employed to ensure accurate alignment with the uttered speech and to reduce variability in pronunciations [36,[39][40][41][42][43][44]. Lastly, the quantity of audio data generated by each speaker is a critical aspect in corpus creation, particularly in datasets with a low number of speakers [36,[38][39][40][41][42][43][44].…”
Section: Related Workmentioning
confidence: 99%