Interspeech 2007 2007
DOI: 10.21437/interspeech.2007-530
|View full text |Cite
|
Sign up to set email alerts
|

A trainable excitation model for HMM-based speech synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
31
0
1

Year Published

2010
2010
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(32 citation statements)
references
References 10 publications
0
31
0
1
Order By: Relevance
“…Some efforts have been devoted in speech synthesis in order to enhance the quality and naturalness by adopting a more subtle excitation model. In the Codebook Excited Linear Predictive (CELP) approach [4], the residual signal is constructed from a codebook containing several typical excitation frames [5]. The Multi Band Excitation (MBE) modeling [6] suggests to divide the frequency axis in several bands, and a voiced/unvoiced decision is taken for each band at each time.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Some efforts have been devoted in speech synthesis in order to enhance the quality and naturalness by adopting a more subtle excitation model. In the Codebook Excited Linear Predictive (CELP) approach [4], the residual signal is constructed from a codebook containing several typical excitation frames [5]. The Multi Band Excitation (MBE) modeling [6] suggests to divide the frequency axis in several bands, and a voiced/unvoiced decision is taken for each band at each time.…”
Section: Introductionmentioning
confidence: 99%
“…According to the Mixed Excitation (ME) approach [7], the residual signal is the superposition of both a periodic and a non-periodic component. Various models derived from the ME approach have been used in HMM-based speech synthesis [8], [9], [10]. A popular technique used in parametric synthesis is the STRAIGHT vocoder [11].…”
Section: Introductionmentioning
confidence: 99%
“…• Although we proposed the use of a Principal Component Analysis, other data mining methods (possibly derived from the functional PCA literature, [21]) could be efficiently employed to extract a suitable representation from the large dataset of normalized GCI-centered residual frames (obtained as described in Section 2.1). • Finally, it would certainly be very interesting to compare the proposed approach with other techniques of excitation modeling, such as STRAIGHT [22], the mixed excitation [7], [8], or based on the Liljencrant-Fant model [9]. Although all these approaches reported a relative improvement with regard to the traditional pulse excitation, no comparison is available yet, since authors worked with different synthesis frameworks and with different databases.…”
Section: Discussionmentioning
confidence: 99%
“…In [7], the filter coefficients were derived from bandpass voicing strenghts. In [8], state-dependent highdegree filters were directly trained using a closed loop procedure. The integration of a Liljencrants-Fant waveform as a modeling of the glottal source, possibly producing different voice qualities by varying the LF parameters, was proposed in [9].…”
Section: Introductionmentioning
confidence: 99%
“…We hypothesize that a better NSF source signal for voiced sounds may contain a certain degree of randomness in the short term while preserving long-term periodicity. Although source signals for classical speech vocoders [24,25,26,27] may be used, we focus on source signals that have a simple parametric form and require no additional analysis loop.…”
Section: Cyclic Noise-based Source Signalmentioning
confidence: 99%