2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
DOI: 10.1109/icassp.2000.861820
|View full text |Cite
|
Sign up to set email alerts
|

Speech parameter generation algorithms for HMM-based speech synthesis

Abstract: This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter gen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
523
0
16

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 710 publications
(541 citation statements)
references
References 11 publications
(2 reference statements)
1
523
0
16
Order By: Relevance
“…Thus it introduces some level of discontinuity. To obtain a smooth trajectory of spectral vectors Maximum Likelihood Parameter Generation (MLPG) [10] is used.…”
Section: Introductionmentioning
confidence: 99%
“…Thus it introduces some level of discontinuity. To obtain a smooth trajectory of spectral vectors Maximum Likelihood Parameter Generation (MLPG) [10] is used.…”
Section: Introductionmentioning
confidence: 99%
“…⊤ denotes the joint static and dynamic feature sequence, W is a transform matrix to extend the static feature sequence into the static and dynamic feature sequence [15]. To avoid the complicated formula ∑ m in Eq.…”
Section: Batch-type Prediction Processmentioning
confidence: 99%
“…On the other hands, the frame spectral feature (i.e., MGC) vector sequence is generated by an HMM parameter generation algorithm [52] given with the CDHMMs, the estimated state durations, and the contextual information (i.e., Iðs nþ1 n−1 Þ; Fðs nþ1 n−1 Þ; p n ; q n ; r n ; and B n n−1 ). It is noted that the energy level of each syllable CD-HMM (i.e., an Initial CD-HMM connecting with a Final CD-HMM) is scaled to se 0 n before executing the parameter generation algorithm so as to make the generated energy contour smooth and approximate the desired syllable energy levels.…”
Section: Speech Synthesismentioning
confidence: 99%