Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2664
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Neural Speech Synthesis for Low-Resource Languages Through Multilingual Modeling

Abstract: Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languages for which abundant multi-speaker data is not available. In this paper, we therefore investigated to what extent … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 18 publications
(32 reference statements)
0
10
0
Order By: Relevance
“…We obtained MLME values from the following studies: [6], [12], [13], [14], [16], [17], [18], [20], [22], [25], [26], and [27], and reported them in Table 3, both as a whole and in specific groups of evaluation metrics, in the form of median (M) and interquartile range (IQR). Also reported are the p-values of the corresponding one-sample Wilcoxon signed rank tests for the hypothesis that the median MLME values are larger than 0.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…We obtained MLME values from the following studies: [6], [12], [13], [14], [16], [17], [18], [20], [22], [25], [26], and [27], and reported them in Table 3, both as a whole and in specific groups of evaluation metrics, in the form of median (M) and interquartile range (IQR). Also reported are the p-values of the corresponding one-sample Wilcoxon signed rank tests for the hypothesis that the median MLME values are larger than 0.…”
Section: Resultsmentioning
confidence: 99%
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
See 2 more Smart Citations
“…On the other side of the spectrum, TTS has come a long way from sounding somewhat robotic to more natural voices. Even so, prosody can still sound off in conversations and TTS components seems underdeveloped for other languages than English, though efforts have been made for multilingual modeling for TTS (De Korte et al, 2020). Also non-verbal elements of conversational Background | 15 speech such as backchanneling and laughter are usually prerecorded for TTS systems.…”
Section: | Chaptermentioning
confidence: 99%