ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414401
|View full text |Cite
|
Sign up to set email alerts
|

Periodnet: A Non-Autoregressive Waveform Generation Model with a Structure Separating Periodic and Aperiodic Components

Abstract: We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveforms parallelly and can be used as a speech vocoder by conditioning an acoustic feature. Since a speech waveform contains periodic and aperiodic components, both components should be appropriately modeled to generate a high-quality speech waveform. However, it is difficult to de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…To address these issues, GAN-based vocoders have been widely explored to take advantage of the compact generator size because the discriminator greatly helps the compact generator achieve high-fidelity speech generation. Parallel Wave-GAN (PWG) [35] and MelGAN [36] are the recent most popular GAN-based vocoders, and many subsequent GAN-based vocoders are based on them [11], [12], [13], [14], [25], [26], [27], [37], [38], [39], [40], [41]. Non-autoregressive models without GAN are also proposed.…”
Section: B Neural Vocoders Based On Generative Modelsmentioning
confidence: 99%
See 3 more Smart Citations
“…To address these issues, GAN-based vocoders have been widely explored to take advantage of the compact generator size because the discriminator greatly helps the compact generator achieve high-fidelity speech generation. Parallel Wave-GAN (PWG) [35] and MelGAN [36] are the recent most popular GAN-based vocoders, and many subsequent GAN-based vocoders are based on them [11], [12], [13], [14], [25], [26], [27], [37], [38], [39], [40], [41]. Non-autoregressive models without GAN are also proposed.…”
Section: B Neural Vocoders Based On Generative Modelsmentioning
confidence: 99%
“…To further improve the F 0 controllability, we introduce F 0 -driven mechanisms designed on the basis of QP-PWG and NSF into the source network. Moreover, inspired by the recent successes of the neural vocoders that adopt harmonic-plus-noise (HN) speech modeling [13], [14], [21], [22], we introduce HN source excitation generation to obtain better sound quality. The overall architecture of uSFGAN is shown in Fig.…”
Section: Unified Source-filter Ganmentioning
confidence: 99%
See 2 more Smart Citations
“…In [95], a non-autoregressive neural vocoder called Period-Net [107] is adopted, which is a non-autoregressive GANbased neural vocoder that is shown to be more robust for generating accurate pitch. Moreover, an automatic pitch correction technique is incorporated that ensures accurate pitch in the synthesized singing voices.…”
Section: Multi-variate Density Outputmentioning
confidence: 99%