A comparison of spectral continuity measures as a join cost in concatenative speech synthesis

Kirkpatrick, Barry; O’Brien, Darragh; Scaife, Ronarn

doi:10.1049/cp:20060488

Cited by 8 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For the first objective test, we used the Euclidean distance between mel-frequency cepstral coefficients (MFCC) (D MFCC ) as the objective measure, since this measure was found to be successful at predicting audible discontinuities in synthesized speech utterances in many studies [26]. For the second objective test, we used the Euclidean distance of delta mel-frequency cepstral coefficients (delta-MFCC) between natural and a synthesized spectral sequences (D delta-MFCC ).…”

Section: Formant Shift Factorsmentioning

confidence: 99%

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model

Nguyen

Akagi

2009

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

Manipulating spectral structure often leads to degradation of speech quality, which is mainly due to insufficient smoothness of the modified spectra between frames, and ineffective spectral modification. This paper presents a new spectral modification method to improve the quality of modified speech. If frames are processed independently, discontinuous features may be generated. Therefore, a speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to model the spectral evolution effectively. Instead of modifying the speech spectra frame by frame, we only need to modify event targets and event functions. This feature leads to easy modification of the speech spectra, and the smoothness of modified speech is ensured by the shape of event functions. To improve spectral modification, we explore Gaussian mixture model parameters (spectral-GMM parameters) to model the spectral envelope of each event target, and develop a new algorithm for modifying spectral-GMM parameters in accordance with formant scaling factors. We first evaluate the effectiveness of our proposed method in spectra modeling, and then apply it to two areas which require different amounts of spectral modification, emotional speech synthesis and voice gender conversion. Experimental results show that the effectiveness of our proposed method is verified for spectra modeling and spectral modification.

show abstract

Section: Formant Shift Factorsmentioning

confidence: 99%

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model

Nguyen

Akagi

2009

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

show abstract

“…The evaluation of each measure was conducted using the database from Kirkpatrick et al [12] and the corresponding perceptual results. The perceptual stimuli consisted of 1800 monosyllabic words.…”

Section: Database and Perceptual Experimentsmentioning

confidence: 99%

“…The delta coefficients were generated by computing the difference between the static feature vectors. Pitch synchronous feature extraction with a window length of one pitch period was found to be the optimum strategy for the task of detecting discontinuities with static spectral measures on the test database [12].…”

Section: Modelling Trajectoriesmentioning

confidence: 99%

“…The individual feature vectors may be on considerably different scales and there may be significant correlation between the measures introducing redundancy. To overcome this problem we propose using a vector to represent a join [15], herein referred to as a join vector, x join (4); this enables the application of a transform, A, on the join vector to rescale and decorrelate the features.…”

Section: Combining Dynamic and Static Measuresmentioning

confidence: 99%

See 1 more Smart Citation

Spectral Dynamics as a Source of Discontinuity in Concatenative Speech Synthesis

Kirkpatrick

O’Brien

Scaife

et al. 2007

2007 15th International Conference on Digital Signal Processing

View full text Add to dashboard Cite

The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).

show abstract

“…Articular models are complex because they are based on the human articu- Finally, in concatenative methods, which currently are the most common, Units (syllables, phonemes, etc.) of any language are stored in a database and, thus, the speech of words and sentences can be generated employing the concatenation of different units with different times (KIRKPATRICKT; O'BRIEN; SCAIFE, 2006;CHAPPELL;HANSEN, 2002). Figura 23b presents a classification of all TTS system identified in our SM.…”

Section: General Findingsmentioning

confidence: 99%

Test orales for systems with complex outputs: the case of TTS systems

Oliveira¹

View full text Add to dashboard Cite

A comparison of spectral continuity measures as a join cost in concatenative speech synthesis

Cited by 8 publications

References 0 publications

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model

Spectral Dynamics as a Source of Discontinuity in Concatenative Speech Synthesis

Test orales for systems with complex outputs: the case of TTS systems

Contact Info

Product

Resources

About