2019 27th European Signal Processing Conference (EUSIPCO) 2019
DOI: 10.23919/eusipco.2019.8903099
|View full text |Cite
|
Sign up to set email alerts
|

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

Abstract: We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer ide… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
46
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 51 publications
(50 citation statements)
references
References 13 publications
0
46
0
Order By: Relevance
“…Non-Seq2Seq singing synthesizers include those based on autoregressive architectures [17,21,22], feed-forward CNN [23], and feed-forward GAN-based approaches [24,25].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Non-Seq2Seq singing synthesizers include those based on autoregressive architectures [17,21,22], feed-forward CNN [23], and feed-forward GAN-based approaches [24,25].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Researches to extend the SVS system to the multi-singer system has been conducted relatively recently. [4] proposes a method of expressing each singer's identity by one-hot embedding. This method is straightforward and simple, but has the limitation of requiring re-training each time to add a new singer.…”
Section: Multi-singer Svs Systemmentioning
confidence: 99%
“…The multi-singer SVS system should not only produce natural pronunciation and pitch contour but also suitably reflect the identity of a particular singer. To achieve this, methods for adding conditional inputs reflecting the singer's identity to the network have been proposed [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…Previous works on SVS include lyrics-to-singing alignment [6,10,12], parametric synthesis [1,19], acoustic modeling [24,27,29], and adversarial synthesis [5,15,21]. Although they achieve reasonably good performance, these systems typically require 1) a large amount of high-quality singing recordings as training data, and 2) strict data alignments between lyrics and singing audio for accurate singing modeling, both of which incur considerable data labeling cost.…”
Section: Introductionmentioning
confidence: 99%
“…Singing Voice Synthesis. Previous works have conducted studies on SVS from different aspects, including lyrics-to-singing alignment [6,10,12], parametric synthesis [1,19], acoustic modeling [27,29], and adversarial synthesis [5,15,21]. Blaauw and Bonada [1] leverage the WaveNet architecture and separates the influence of pitch and timbre for parametric singing synthesis.…”
Section: Introductionmentioning
confidence: 99%