ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746291
|View full text |Cite
|
Sign up to set email alerts
|

Distribution Augmentation for Low-Resource Expressive Text-To-Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…In parallel, we train a duration model which will predict at inference time the duration of each phoneme given the phoneme sequence. The duration model as in [21,27] consists of a stack of 3 convolution layers with 512 channels, kernel size of 5 and a dropout of 30%, a Bi-LSTM layer and a linear dense layer. To produce speech, we vocode the mel-spectrograms frame using a universal vocoder [28].…”
Section: Non-attentive Tts Architecturementioning
confidence: 99%
“…In parallel, we train a duration model which will predict at inference time the duration of each phoneme given the phoneme sequence. The duration model as in [21,27] consists of a stack of 3 convolution layers with 512 channels, kernel size of 5 and a dropout of 30%, a Bi-LSTM layer and a linear dense layer. To produce speech, we vocode the mel-spectrograms frame using a universal vocoder [28].…”
Section: Non-attentive Tts Architecturementioning
confidence: 99%