This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ($15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system performs F0 conversion at the syllable level while duration conversion takes place at the phone level using a set of linguistic regression trees. Two alternative methods are presented as a means to predict F0 contours for unseen utterances. Firstly, an HMM-based approach uses syllables as linguistic building blocks to model and generate F0 contours. Secondly, an F0 segment selection approach expresses F0 conversion as a search problem, where syllable-based F0 contour segments from a target speech corpus are spliced together under contextual constraints. To complement the prosody modules, a GMM-based spectral conversion function is used to transform the voice quality. Each independent module and the combined emotion conversion framework were evaluated through a perceptual study. Preference tests demonstrated that each module contributes a measurable improvement in the perception of the target emotion. Furthermore, an emotion classification test showed that converted utterances with either F0 generation technique were able to convey the desired emotion above chance level. However, F0 segment selection outperforms the HMM-based F0 generation method both in terms of emotion recognition rates as well as intonation quality scores, particularly in the case of anger and surprise. Using segment selection, the emotion recognition rates for the converted neutral utterances were comparable to the same utterances spoken directly in the target emotion.
Voicemail has become an integral part of our personal and professional communication. The number of messages that accumulate in our voice mailboxes necessitate new ways of prioritizing them. Currently, we are forced to actively listen to all messages in order to find out which ones are important and which ones can be attended to later on. In this paper, we describe Emotive Alert, a system that can detect some of the significant emotions in a new message and notify the account owner along various affective axes, including urgency, formality, valence (happy vs. sad) and arousal (calm vs. excited). We have used a purely acoustic, HMM-based approach for identifying the emotions, which allows application of this system to all messages independent of language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.