IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.5743763
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of syllable duration models for Mandarin speech

Abstract: In this paper, the multiplicative syllable duration model proposed previously for Mandarin speech is extended in some aspects. First, the three basic Tone 3 patt erns (i.e., full lone, half lone and sandhi tone) are properly considered via using three different companding factors (CFs) to separate their affections. Second, the CPs of the model are analyzed in detail. Third, the syllable duration modeling method is applied 10 an automatically-segmented, SOO-speaker, telephone-speech database. Fourth, a comparat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2003
2003
2008
2008

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…When performing M-step, the new estimate is obtained by solving (16) However, we could not derive the closed-form solution to new estimate . The Newton's algorithm [1] is applied to iteratively reach the optimal solution (17) where (18) In [36], MAP adaptation of Gaussian duration parameters was proposed. Gamma priors were adopted to derive MAP estimates of Gaussian mean.…”
Section: B Map Estimation For Gamma Duration Parametersmentioning
confidence: 99%
See 1 more Smart Citation
“…When performing M-step, the new estimate is obtained by solving (16) However, we could not derive the closed-form solution to new estimate . The Newton's algorithm [1] is applied to iteratively reach the optimal solution (17) where (18) In [36], MAP adaptation of Gaussian duration parameters was proposed. Gamma priors were adopted to derive MAP estimates of Gaussian mean.…”
Section: B Map Estimation For Gamma Duration Parametersmentioning
confidence: 99%
“…N-best hypotheses of fast speech could be properly selected to improve large vocabulary continuous speech recognition performance [9]. Except the effects on speech recognition, speaking rate served as an important prosody feature in text-to-speech system [18]. Synthesized speech with user-adaptive speaking rate provided good naturalness for human listeners [34].…”
Section: Introductionmentioning
confidence: 99%
“…In this paper we try to partially solve the problem via extending the conventional state duration method to further consider the speaking rate of the testing utterance and adding a new syllable duration model in an HMM -based Mandarin base-syllable recognizer for improving its performance. The proposed statistical duration modeling method has been tried on Min-Nan and Mandarin Chinese text -to-speech system and got some significant improvements on the duration prediction [4,5].…”
Section: Introductionmentioning
confidence: 99%