Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-678
|View full text |Cite
|
Sign up to set email alerts
|

Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis

Abstract: In this paper, we present an extension of a novel continuous residual-based vocoder for statistical parametric speech synthesis. Previous work has shown the advantages of adding envelope modulated noise to the voiced excitation, but this has not been investigated yet in the context of continuous vocoders, i.e. of which all parameters are continuous. The noise component is often not accurately modeled in modern vocoders (e.g. STRAIGHT). For more natural sounding speech synthesis, four time-domain envelopes (Amp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
37
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
2

Relationship

4
3

Authors

Journals

citations
Cited by 16 publications
(38 citation statements)
references
References 21 publications
1
37
0
Order By: Relevance
“…Next, we removed the post-processing step in the estimation of the MVF parameter and thus improved the modelling of unvoiced sounds within our continuous vocoder [29]. Finally, we applied various time domain envelopes for advanced modeling of the noise excitation [30].…”
Section: Continuous F0 Modeling Within Vocodersmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, we removed the post-processing step in the estimation of the MVF parameter and thus improved the modelling of unvoiced sounds within our continuous vocoder [29]. Finally, we applied various time domain envelopes for advanced modeling of the noise excitation [30].…”
Section: Continuous F0 Modeling Within Vocodersmentioning
confidence: 99%
“…During the synthesis phase, voiced excitation is composed of residual excitation frames overlap-added pitch synchronously, depending on the continuous F0 [28,29,30]. After that, this voiced excitation is lowpass filtered frame by frame at the frequency given by the MVF parameter.…”
Section: Continuous Vocodermentioning
confidence: 99%
“…Previous studies have shown that human voice can be modelled effectively as a sum of sinusoids and has shown the capability of providing high-quality copy synthesis and prosodic modifications [29] [30] [31]. Therefore, in [32] we proposed a continuous sinusoidal model (CSM) that is applicable in statistical frameworks by keeping the number of our vocoder parameters unchanged [26]. Experimental results from objective and subjective evaluations have shown that the proposed vocoder gives state-of-the-art vocoders performance in analysis-synthesis while outperforming the previous work of our continuous F0 based source-filter vocoder.…”
Section: Related Workmentioning
confidence: 99%
“…By keeping the number of our previous source-filter vocoder parameters unchanged [26] and similarly to [29] [43], the synthesis algorithm implemented in this paper decomposes the speech frames into a lower-band voiced component sv(t) and an upper-band noise component sn(t) based on MVF values. We define these components here as…”
Section: B Continuous Sinusoidal Modelmentioning
confidence: 99%
See 1 more Smart Citation