A Log Domain Pulse Model for Parametric Speech Synthesis

Degottex, Gilles; Lanchantin, Pierre; Gales, Mark J. F.

doi:10.1109/taslp.2017.2761546

Cited by 21 publications

(37 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This work used a conventional vocoder called WORLD [10] as the baseline. It then included a phase-recovery technique [12], a waveform synthesizer based on a log-domain pulse model [11], and a Wavenet-based vocoder for comparison. Complex-valued approaches may be included in future work.…”

Section: Relationship Between Acoustic Features and Waveformsmentioning

confidence: 99%

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Wang

Lorenzo-Trueba

Takaki

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches by means of a large scale crowdsourced evaluation. Results on acoustic models showed that generative adversarial networks and an autoregressive (AR) model performed better than a normal recurrent network and the AR model performed best. Evaluation on vocoders by using the same AR acoustic model demonstrated that a Wavenet vocoder outperformed classical sourcefilter-based vocoders. Particularly, generated speech waveforms from the combination of AR acoustic model and Wavenet vocoder achieved a similar score of speech quality to vocoded speech.

show abstract

Section: Relationship Between Acoustic Features and Waveformsmentioning

confidence: 99%

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Wang

Lorenzo-Trueba

Takaki

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…How to analyze and generate the random component for synthetic voice has been a difficult problem [5,7,11,12]. In addition to this difficulty in analysis and synthesis, auditory perception introduces another difficulty.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Group delay manipulation used in legacy-STRAIGHT was successful for reducing this impression [4]. The log domain pulse model (LDPM) also uses phase manipulation [7]. However, such manipulation results smearing of the signal in the time domain.…”

Section: Background and Related Workmentioning

confidence: 99%

Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis

Kawahara

Sakakibara²,

Morise

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

We propose a new excitation source signal for VOCODERs and an all-pass impulse response for post-processing of synthetic sounds and pre-processing of natural sounds for data-augmentation. The proposed signals are variants of velvet noise, which is a sparse discrete signal consisting of a few non-zero (1 or -1) elements and sounds smoother than Gaussian white noise. One of the proposed variants, FVN (Frequency domain Velvet Noise) applies the procedure to generate a velvet noise on the cyclic frequency domain of DFT (Discrete Fourier Transform). Then, by smoothing the generated signal to design the phase of an all-pass filter followed by inverse Fourier transform yields the proposed FVN. Temporally variable frequency weighted mixing of FVN generated by frozen and shuffled random number provides a unified excitation signal which can span from random noise to a repetitive pulse train. The other variant, which is an all-pass impulse response, significantly reduces "buzzy" impression of VOCODER output by filtering. Finally, we will discuss applications of the proposed signal for watermarking and psychoacoustic research.

show abstract

“…Meanwhile, recent signal processing methods for vocoding have improved the synthetic speech quality. These techniques include sourcefilter models [7], [8], sinusoidal harmonic-plus-noise models [9], advanced aperiodicity models [10], [11], and direct modeling of the magnitude and phase spectra [12]. Furthermore, the ongoing emergence of neural network waveform generation models, i.e.…”

Section: Introductionmentioning

confidence: 99%

GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis

Juvela

Bollepalli

Tsiaras

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

A Log Domain Pulse Model for Parametric Speech Synthesis

Cited by 21 publications

References 31 publications

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis

Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis

GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis

Contact Info

Product

Resources

About