2019 27th European Signal Processing Conference (EUSIPCO) 2019
DOI: 10.23919/eusipco.2019.8903122
|View full text |Cite
|
Sign up to set email alerts
|

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

Abstract: We conduct an investigation on various hyperparameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…The auto-encoder is implemented as a fully convolutional neural network, where the encoder and decoder resemble mirrored structures. To benefit from the natural topology of the frequency axis, we treat the sequence of mel-spectrograms as images and perform 2D convolutions on them [47]. Through strides in frequency direction, the encoder reduces the frequency axis to length 1 but keeps the time axis unchanged.…”
Section: Network Architecturementioning
confidence: 99%
“…The auto-encoder is implemented as a fully convolutional neural network, where the encoder and decoder resemble mirrored structures. To benefit from the natural topology of the frequency axis, we treat the sequence of mel-spectrograms as images and perform 2D convolutions on them [47]. Through strides in frequency direction, the encoder reduces the frequency axis to length 1 but keeps the time axis unchanged.…”
Section: Network Architecturementioning
confidence: 99%
“…Non-Seq2Seq singing synthesizers include those based on autoregressive architectures [17,21,22], feed-forward CNN [23], and feed-forward GAN-based approaches [24,25].…”
Section: Relation To Prior Workmentioning
confidence: 99%