2018
DOI: 10.1109/taslp.2017.2761546
|View full text |Cite
|
Sign up to set email alerts
|

A Log Domain Pulse Model for Parametric Speech Synthesis

Abstract: Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same repre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
36
1

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(37 citation statements)
references
References 31 publications
0
36
1
Order By: Relevance
“…This work used a conventional vocoder called WORLD [10] as the baseline. It then included a phase-recovery technique [12], a waveform synthesizer based on a log-domain pulse model [11], and a Wavenet-based vocoder for comparison. Complex-valued approaches may be included in future work.…”
Section: Relationship Between Acoustic Features and Waveformsmentioning
confidence: 99%
“…This work used a conventional vocoder called WORLD [10] as the baseline. It then included a phase-recovery technique [12], a waveform synthesizer based on a log-domain pulse model [11], and a Wavenet-based vocoder for comparison. Complex-valued approaches may be included in future work.…”
Section: Relationship Between Acoustic Features and Waveformsmentioning
confidence: 99%
“…How to analyze and generate the random component for synthetic voice has been a difficult problem [5,7,11,12]. In addition to this difficulty in analysis and synthesis, auditory perception introduces another difficulty.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Group delay manipulation used in legacy-STRAIGHT was successful for reducing this impression [4]. The log domain pulse model (LDPM) also uses phase manipulation [7]. However, such manipulation results smearing of the signal in the time domain.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Meanwhile, recent signal processing methods for vocoding have improved the synthetic speech quality. These techniques include sourcefilter models [7], [8], sinusoidal harmonic-plus-noise models [9], advanced aperiodicity models [10], [11], and direct modeling of the magnitude and phase spectra [12]. Furthermore, the ongoing emergence of neural network waveform generation models, i.e.…”
Section: Introductionmentioning
confidence: 99%