On the basis of the short-time relative speech rate defined by the authors, this paper examines the optimum width of the smoothing window by perceptual experiments on the naturalness of re-synthesized speech. With the optimum window of 270 ms, relative speech rates are obtained both for 'fast' and 'slow' utterances of the same sentence, using an utterance produced at a 'normal' speech rate. The averaged results show that the speech rate control function for an utterance can be approximately decomposed into a global component for each sentence and local components for each bunsetsu and each major syntactic boundary. Based on these results, a scheme is presented for controlling the local speech rate of a reference utterance to obtain a synthetic utterance of an arbitrary global speech rate.
On the basis of the short-time relative speech rate defined by the authors, this paper examines the optimum width of the smoothing window by perceptual experiments on the naturalness of re-synthesized speech. With the optimum window of 270 ms, relative speech rates are obtained both for 'fast' and 'slow' utterances of the same sentence, using an utterance produced at a 'normal' speech rate. The averaged results show that the speech rate control function for an utterance can be approximately decomposed into a global component for each sentence and local components for each bunsetsu and each major syntactic boundary. Based on these results, a scheme is presented for controlling the local speech rate of a reference utterance to obtain a synthetic utterance of an arbitrary global speech rate.
It is well known that speech rate varies both globally and locally in natural discourse due to various factors such as contrastive stress, syntactic boundaries, emotion, etc. While the global speech rate can be clearly defined by the durations of utterances and pauses, the local speech rate has not been well defined. The present authors have proposed a rigorous and quantiative definition for the relative local speech rate and showed an objective method for its measurement [S. Ohno and H. Fujisaki, Proc. EUROSPEECH’95, Vol. 1, pp. 421–424 (1995)]. Based on the analysis of changes in both global and local speech rates found in a speech material consisting of readings of a story at various speech rates, the present paper proposes rules for controlling the global and local speech rates in order to produce a synthetic discourse to fit exactly in a specified time interval. The validity of the method has been tested and confirmed by perceptual experiments using synthetic discourse of various durations generated from a natural discourse by analysis–resynthesis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.