“…Spectral features of this study include timbral features that have been successful in music recognition [61]. Timbral features define the quality of a sound [62] and they are a complete opposite of most general features like pitch and intensity. It has been revealed that a strong relationship exists between voice quality and emotional content in a speech [58].…”
Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.
“…Spectral features of this study include timbral features that have been successful in music recognition [61]. Timbral features define the quality of a sound [62] and they are a complete opposite of most general features like pitch and intensity. It has been revealed that a strong relationship exists between voice quality and emotional content in a speech [58].…”
Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.
“…Secondly, beyond the enviable valence and arousal, tension [29,32,36,40,44] is another common dimension too. Thirdly, the felt emotion [28,35] is infrequently examined compared to the perceived emotion. Finally, most works employ a wide range of acoustic features, while the interest in a single feature moves from pitch [40,44] to timbre [28] and rhythm [26].…”
Section: Introductionmentioning
confidence: 99%
“…Thirdly, the felt emotion [28,35] is infrequently examined compared to the perceived emotion. Finally, most works employ a wide range of acoustic features, while the interest in a single feature moves from pitch [40,44] to timbre [28] and rhythm [26]. For the emotion space, Russell [13] proposed the valence/arousal space and Thayer [14] reduced it to four labels; the following synonyms are commonly used: valence as pleasantness [38]; arousal as activity [36,45]; tension [29,32,36,40,44] as interest [43], expectancy [40], strength [38], potency [38], and resonance [27].…”
There were a lot of psychological music experiments and models but there were few psychological rhythm experiments and models. There were a lot of physiological music experiments but there were few physiological music models. There were few physiological rhythm experiments but there was no physiological rhythm model. We proposed a physiological rhythm model to fill this gap. Twenty-two participants, 4 drum loops as stimuli, and electrocardiogram (ECG) were employed in this work. We designed an algorithm to map tempo, complexity, and energy into two heart rate variability (HRV) measures, the standard deviation of normal-to-normal heartbeats (SDNN) and the ratio of low- and high-frequency powers (LF/HF); these two measures form the physiological valence/arousal plane. There were four major findings. Initially, simple and loud rhythms enhanced arousal. Secondly, the removal of fast and loud rhythms decreased arousal. Thirdly, fast rhythms increased valence. Finally, the removal of fast and quiet rhythms increased valence. Our work extended the psychological model to the physiological model and deepened the musical model into the rhythmic model. Moreover, this model could be the rules of automatic music generating systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.