IET Irish Signals and Systems Conference (ISSC 2006) 2006
DOI: 10.1049/cp:20060488
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of spectral continuity measures as a join cost in concatenative speech synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0
2

Year Published

2007
2007
2011
2011

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 0 publications
0
6
0
2
Order By: Relevance
“…For the first objective test, we used the Euclidean distance between mel-frequency cepstral coefficients (MFCC) (D MFCC ) as the objective measure, since this measure was found to be successful at predicting audible discontinuities in synthesized speech utterances in many studies [26]. For the second objective test, we used the Euclidean distance of delta mel-frequency cepstral coefficients (delta-MFCC) between natural and a synthesized spectral sequences (D delta-MFCC ).…”
Section: Formant Shift Factorsmentioning
confidence: 99%
“…For the first objective test, we used the Euclidean distance between mel-frequency cepstral coefficients (MFCC) (D MFCC ) as the objective measure, since this measure was found to be successful at predicting audible discontinuities in synthesized speech utterances in many studies [26]. For the second objective test, we used the Euclidean distance of delta mel-frequency cepstral coefficients (delta-MFCC) between natural and a synthesized spectral sequences (D delta-MFCC ).…”
Section: Formant Shift Factorsmentioning
confidence: 99%
“…The evaluation of each measure was conducted using the database from Kirkpatrick et al [12] and the corresponding perceptual results. The perceptual stimuli consisted of 1800 monosyllabic words.…”
Section: Database and Perceptual Experimentsmentioning
confidence: 99%
“…The delta coefficients were generated by computing the difference between the static feature vectors. Pitch synchronous feature extraction with a window length of one pitch period was found to be the optimum strategy for the task of detecting discontinuities with static spectral measures on the test database [12].…”
Section: Modelling Trajectoriesmentioning
confidence: 99%
See 1 more Smart Citation
“…Articular models are complex because they are based on the human articu- Finally, in concatenative methods, which currently are the most common, Units (syllables, phonemes, etc.) of any language are stored in a database and, thus, the speech of words and sentences can be generated employing the concatenation of different units with different times (KIRKPATRICKT; O'BRIEN; SCAIFE, 2006;CHAPPELL;HANSEN, 2002). Figura 23b presents a classification of all TTS system identified in our SM.…”
Section: General Findingsmentioning
confidence: 99%