2002
DOI: 10.1002/scj.1133
|View full text |Cite
|
Sign up to set email alerts
|

Pitch pattern generation using multispace probability distribution HMM

Abstract: SUMMARYA scheme for simultaneously modeling and generating a pitch pattern and a spectral sequence on the basis of a hidden Markov model (HMM) is presented. Since a pitch pattern is expressed as a time series of voiced intervals taking continuous values and voiceless intervals without values, it cannot be modeled by the usual HMM. This paper proposes a scheme for modeling a pitch and a spectrum integrally with characteristic parameters that combine pitch parameters and spectral parameters by applying an HMM ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0
1

Year Published

2004
2004
2015
2015

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(19 citation statements)
references
References 12 publications
0
18
0
1
Order By: Relevance
“…In particular, the domains of the dynamic F0 features (normally the 1 st and 2 nd order derivatives of the static F0 observations, referred to as delta and deltadelta features, respectively) are also discontinuous. Hence, for frames at the boundaries between voiced and unvoiced regions, they can not be directly calculated and are therefore defined as NULL in the most widely used implementation of MSDHMM, i.e., these frames are regarded as unvoiced as far as the dynamic features are concerned [11]. This means that near a boundary, the static F0 feature can be a real value whilst the delta and delta-delta features are NULL.…”
Section: Discontinuous F0 Modellingmentioning
confidence: 99%
See 2 more Smart Citations
“…In particular, the domains of the dynamic F0 features (normally the 1 st and 2 nd order derivatives of the static F0 observations, referred to as delta and deltadelta features, respectively) are also discontinuous. Hence, for frames at the boundaries between voiced and unvoiced regions, they can not be directly calculated and are therefore defined as NULL in the most widely used implementation of MSDHMM, i.e., these frames are regarded as unvoiced as far as the dynamic features are concerned [11]. This means that near a boundary, the static F0 feature can be a real value whilst the delta and delta-delta features are NULL.…”
Section: Discontinuous F0 Modellingmentioning
confidence: 99%
“…(2) 4 . Hence, the state output distribution of the full F0 observation is a product of the output distributions of the static and dynamic streams [11].…”
Section: Discontinuous F0 Modellingmentioning
confidence: 99%
See 1 more Smart Citation
“…Though there are some exceptions [8,4], the most widely used method is to model static and dynamic features in separate streams [9]. This common implementation limits the power of HMMs to model the F0 trajectory.…”
Section: Comparison Of F0 Modelling Approaches For Hmm Based Speech Smentioning
confidence: 99%
“…As a result, subsequent F0 modeling and generation suffer. Standard HMM-based TTS [2] uses multi-space distribution (MSD) to model and generate discontinuous F0 trajectories [18]. Faulty voicing decisions resulting from the F0 extraction phase will cause the deteriorately trained MSD-HMMs to synthesize voiced frames as unvoiced, resulting in hoarse speech, or to synthesize unvoiced frames as voiced, resulting in buzzy speech [19].…”
Section: Introductionmentioning
confidence: 99%