This paper presents a voice conversion method based on transformation of the characteristic features of a source speaker towards a target. Voice characteristic features are grouped into two main categories: (a) the spectral features at formants and (b) the pitch and intonation patterns. Signal modelling and transformation methods for each group of voice features are outlined. The spectral features at formants are modelled using a set of two-dimensional phoneme-dependent HMMs. Subband frequency warping is used for spectrum transformation with the subbands centred on the estimates of the formant trajectories. The F0 contour is used for modelling the pitch and intonation patterns of speech. A PSOLA based method is employed for transformation of pitch, intonation patterns and speaking rate. The experiments present illustrations and perceptual evaluations of the results of transformations of the various voice features.
This paper presents an analysis of the acoustic correlates of the differences of British, Australian and American English accents. The structures of the differences that characterise accents in speech can be divided into two parts: (a) phonetic differences and (b) acoustic differences. The focus of this paper is on the analysis of acoustic correlates of accents including formants and their trajectories, pitch trajectory, pitch accent, pitch nucleus, duration and speaking rate. The acoustics of accents are modelled and estimated using twodimensional HMMs of formants and a model of pitch such as RiseiFalllConnect (RFC) model. The differences between British, Broad Australian and General American English accents are discussed. Australian accent has a lower Is' formant (Fl) hut higher Znd fomant (F2) compared to British and American. The second formant in speech is considered as the most sensitive to accent identity British speakers have the largest pitch frequency range and the largest initial pitch rise and final pitch fall rates in utterances. Australian accent exhibits significant elongation of vowels and the lowest speaking rate compared to other two accents. The differences in acoustic correlates across accents are used to morph the accent of a source speaker towards a target accent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.