2014
DOI: 10.1186/1687-4722-2014-12
|View full text |Cite
|
Sign up to set email alerts
|

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Abstract: Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: (i) The decision tree structure lacks adequate context generalization. (ii) It is unable to express complex context dependencies. (iii) Parameters generated from this structure represent sudden transitions betwe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 41 publications
0
4
0
Order By: Relevance
“…STRAIGHT extracts 513-D spectral envelope (SP), 513-D aperiodicity (AP) features as well as 1-D fundamental frequency (F0). We employ speech signal processing toolkit (SPTK) to convert SP to 40-D Mel-cepstral coefficients (MCEP) [15]. The de-identification is only applied to MCEP and F0 features; the AP features are directly mapped from the source speaker to the de-identified speaker.…”
Section: Featuresmentioning
confidence: 99%
“…STRAIGHT extracts 513-D spectral envelope (SP), 513-D aperiodicity (AP) features as well as 1-D fundamental frequency (F0). We employ speech signal processing toolkit (SPTK) to convert SP to 40-D Mel-cepstral coefficients (MCEP) [15]. The de-identification is only applied to MCEP and F0 features; the AP features are directly mapped from the source speaker to the de-identified speaker.…”
Section: Featuresmentioning
confidence: 99%
“…In paper [60] , authors successfully use neural-scaled entropy (NSE) and cochlearscaled entropy (CSE) in order to predict the effects of nonlinear frequency compression on speech perception due to sensorineural hearing loss (SNHL). In speech synthesis, the major limitations imposed with the use of tree-clustered context-dependent hidden semi-Markov models adopted are fruitfully circumvented by the authors of paper [40] based on entropy.…”
Section: A Review On Entropy and Its Applicationsmentioning
confidence: 99%
“…F0 modeling with additive structures has also been used to express the relationship between contextual factors and the F0 trajectory [44][45][46][47][48][49][50][51][52][53][54]. Contextual additive modeling [45][46][47][48] assumes model parameters to be a sum of multiple independent components, each having different context dependencies; therefore, different decision trees have to be trained for them.…”
Section: Related Workmentioning
confidence: 99%