2021
DOI: 10.48550/arxiv.2106.15153
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis

Abstract: Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Finetuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sample and the model depends on the speaker. In this work, we propose GANSpeech, which is a high-fidelity multi-speaker T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
0
1
0
Order By: Relevance
“…The discriminator structure is modeled and represented by D ϕ (x t−1 , x t , t, s) with learnable parameters ϕ. The discriminator uses joint conditional and unconditional loss (JCU) [26], which combines conditional and unconditional adversarial losses to further improve the accuracy of the mel-spectrogram and speech waveform mapping.…”
Section: Diffusion Decoder and Discriminatormentioning
confidence: 99%
“…The discriminator structure is modeled and represented by D ϕ (x t−1 , x t , t, s) with learnable parameters ϕ. The discriminator uses joint conditional and unconditional loss (JCU) [26], which combines conditional and unconditional adversarial losses to further improve the accuracy of the mel-spectrogram and speech waveform mapping.…”
Section: Diffusion Decoder and Discriminatormentioning
confidence: 99%
“…In addition, speech synthesis has introduced some other generative models that have also achieved very good performance. Flow-based models are found in [ 15 , 16 , 17 ], variational autoencoder (VAE)-based models are listed in [ 17 , 18 ], generative adversarial network (GAN)-based models are presented in [ 19 ], and diffusion process-based models are described in [ 20 , 21 , 22 , 23 ].…”
Section: Introductionmentioning
confidence: 99%