The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2020
DOI: 10.1109/taslp.2020.2970241
|View full text |Cite
|
Sign up to set email alerts
|

A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 23 publications
1
27
0
Order By: Relevance
“…Inspired by the neural excitation generation of differentiable digital signal processing (DDSP) [52] and the neural spectral filtering of NSF, completely differentiable source-filter vocoders with a GAN structure such as neural homomorphic vocoder (NHV) [20] and HooliGAN [21] also have been proposed. Furthermore, the authors of HiNet [22] also adopt a deep NN (DNN) model and an NSF model with GAN structures to respectively predict amplitude spectrum and phase for hierarchical speech generation.…”
Section: B Gan-based Vocodersmentioning
confidence: 99%
“…Inspired by the neural excitation generation of differentiable digital signal processing (DDSP) [52] and the neural spectral filtering of NSF, completely differentiable source-filter vocoders with a GAN structure such as neural homomorphic vocoder (NHV) [20] and HooliGAN [21] also have been proposed. Furthermore, the authors of HiNet [22] also adopt a deep NN (DNN) model and an NSF model with GAN structures to respectively predict amplitude spectrum and phase for hierarchical speech generation.…”
Section: B Gan-based Vocodersmentioning
confidence: 99%
“…Next, subjective experiments show that our bandwidth extension method consistently offers significant perceptual quality improvement to the results of speech denoising systems including HiFi-GAN [5], DEMUCS [7] and DeepMMSE [8]. It also improves the quality of vocoders including WaveNet [1], WaveRNN [12] and HiNet [13] which could potentially be applied to TTS as well.…”
Section: Introductionmentioning
confidence: 86%
“…We use the same trained model as in the denoising task in Section 4.2, and apply it to the outputs of three vocoding algorithms, including WaveNet [1], WaveRNN [12] and HiNet [13]. We took their audio samples from HiNet's project website.…”
Section: Bandwidth Extension For Waveform Generationmentioning
confidence: 99%
“…Eunwoo et al [24] proposed a Long-Short-Term-Memory (LSTM) based Recurrent Neural Network for TTS. Furthermore, many researchers have used other Neural Networks for TTS [25], [26]. These autoregressive models directly generate raw audios, which makes them expensive and slow.…”
Section: A Audio Generationmentioning
confidence: 99%