A robust unit selection system for speech synthesis

Conkie, Alistair

doi:10.1121/1.425343

Cited by 45 publications

(26 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many TTS systems proposed by [1][2][3][4][5] have been implemented by using concatenative method based on different speech units and they can generate high quality synthesized speech. For Myanmar language, there has been considerable effort on speech processing in Myanmar natural language processing.…”

Section: Related Workmentioning

confidence: 99%

Phoneme based myanmar text to speech system

Hlaing¹,

Thida²

2018

IJACR

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Phoneme based myanmar text to speech system

Hlaing¹,

Thida²

2018

IJACR

View full text Add to dashboard Cite

“…[1]- [4] attempts to avoid or reduce spectral discontinuities in formants and spectral tilt by choosing the acoustic units from a large inventory. The selection is based, among other things, on the minimization of the distance between magnitude spectra (usually represented by cepstrum coefficients, or line spectrum frequencies) from the concatenated acoustic units.…”

mentioning

confidence: 99%

Removing linear phase mismatches in concatenative speech synthesis

Stylianou

2001

IEEE Trans. Speech Audio Process.

View full text Add to dashboard Cite

Abstract-Many current text-to-speech (TTS) systems are based on the concatenation of acoustic units of recorded speech. While this approach is believed to lead to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded at different times and in a different order. One important issue related to the concatenation of these acoustic units is their synchronization. In terms of signal processing this means removing linear phase mismatches between concatenated speech frames. This paper presents two novel approaches to the problem of synchronization of speech frames with an application to concatenative speech synthesis. Both methods are based on the processing of phase spectra without, however, decreasing the quality of the output speech, in contrast to previously proposed methods. The first method is based on the notion of center of gravity and the second on differentiated phase data. They are applied off-line, during the preparation of the speech database without, therefore, any computational burden on synthesis. The proposed methods have been tested with the harmonic plus noise model, HNM, and the TTS system of AT&T Labs. The resulting synthetic speech is free of linear phase mismatches.

show abstract

“…The general search method (Hunt and Black, 1996) has been refined (e.g. Conkie, 1999;Taylor, 2000;Bulyko and Ostendorf, 2001) and complemented by other procedures for specific tasks such as limited domain speech synthesis (Black and Lenzo, 2000).…”

Section: Introductionmentioning

confidence: 99%