On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

Weninger, Felix; Eyben, Florian; Schuller, Björn; Mortillaro, Marcello; Scherer, Klaus R.

doi:10.3389/fpsyg.2013.00292

Cited by 188 publications

(169 citation statements)

References 32 publications

(37 reference statements)

Supporting

Mentioning

158

Contrasting

Unclassified

Order By: Relevance

“…Another avenue for further development, especially when data for larger groups of singers and actors become available, is the use of machine learning to assess the degree of generalizability of the underlying acoustic profiles. A recent study by Weninger et al (2013) shows that algorithms trained on emotional music are quite successful on emotional speech and vice versa, suggesting that this may be a very viable approach. Weniger et al show that the effect generalizes, to some extent, even to environmental sounds.…”

Section: Resultsmentioning

confidence: 99%

Comparing the acoustic expression of emotion in the speaking and the singing voice

Scherer

Sundberg

Tamarit

et al. 2015

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

We examine the similarities and differences in the expression of emotion in the singing and the speaking voice. Three internationally renowned opera singers produced "vocalises" (using a schwa vowel) and short nonsense phrases in different interpretations for 10 emotions. Acoustic analyses of emotional expression in the singing samples show significant differences between the emotions. In addition to the obvious effects of loudness and tempo, spectral balance and perturbation make significant contributions (high effect sizes) to this differentiation. A comparison of the emotion-specific patterns produced by the singers in this study with published data for professional actors portraying different emotions in speech generally show a very high degree of similarity. However, singers tend to rely more than actors on the use of voice perturbation, specifically vibrato, in particular in the case of high arousal emotions. It is suggested that this may be due to by the restrictions and constraints imposed by the musical structure.

show abstract

Section: Resultsmentioning

confidence: 99%

Comparing the acoustic expression of emotion in the speaking and the singing voice

Scherer

Sundberg

Tamarit

et al. 2015

Computer Speech & Language

Self Cite

View full text Add to dashboard Cite

show abstract

“…Previous findings in [18] suggest that the expression of emotions in speaking and singing voice are related. Further, [12] concludes that similar methods and acoustic features can be used to automatically classify emotions in speech, polyphonic music, as well as emotions perceived by listeners in or associated by them with other, general sounds. This suggests that the methods for speech emotion recognition can be transferred to singing emotion recognition.…”

Section: Related Workmentioning

confidence: 98%

“…[10][11][12]), although the fact that emotions are visible in acoustic properties of the voice has been frequently acknowledged [13,14]. In particular, in music, emotions play a major role and singers must be able to easily express a wide range of emotions.…”

Section: Related Workmentioning

confidence: 99%

Emotion in the singing voice—a deeperlook at acoustic features in the light ofautomatic classification

Eyben

Salomão

Sundberg

et al. 2015

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.

show abstract

“…In this work, we employ the openSMILE feature extractor [36] to obtain the ComParE acoustic feature set of 65 low-level descriptors (LLD) (4 energy-related, 55 spectral and 6 voicing-related), which has been successfully applied for automatic recognition of paralinguistic phenomena [37]. The 65 LLD used are summarized in Table 3 in [38].…”

Section: Featuresmentioning

confidence: 99%

“…The audio features extracted for each sequence of the CONFER Database are down-sampled to 25 Hz frequency to match the frame rate of the video stream. Similarly to [39,37] the audio features of each sequence are z-normalized (each feature component is normalized to mean=0 and standard deviation=1). Visual features.…”

Section: Featuresmentioning

confidence: 99%

The Conflict Escalation Resolution (CONFER) Database

Georgakis

Panagakis

Zafeiriou

et al. 2017

Image and Vision Computing

View full text Add to dashboard Cite

Conflict is usually defined as a high level of disagreement taking place when individuals act on incompatible goals, interests, or intentions. Research in human sciences has recognized conflict as one of the main dimensions along which an interaction is perceived and assessed. Hence, automatic estimation of conflict intensity in naturalistic conversations would be a valuable tool for the advancement of human-centered computing and the deployment of novel applications for social skills enhancement including conflict management and negotiation. However, machine analysis of conflict is still limited to just a few works, partially due to an overall lack of suitable annotated data, while it has been mostly approached as a conflict or (dis)agreement detection problem based on audio features only. In this work, we aim to overcome the aforementioned limitations by a) presenting the Conflict Escalation Resolution (CONFER) Database, a set of excerpts from audio-visual recordings of televised political debates where conflicts naturally arise, and b) reporting baseline experiments on audio-visual conflict intensity estimation. The database contains approximately 142 minutes of recordings in Greek language, split over 120 non-overlapping episodes of naturalistic conversations that involve two or three interactants. Subject-and session-independent experiments are conducted on continuous-time (frame-by-frame) estimation of real-valued conflict intensity, as opposed to binary conflict/non-conflict clas- * Corresponding author. E-mail address: christos.georgakis@imperial.ac.uk Preprint submitted to Image and Vision Computing December 20, 2016 A C C E P T E D M A N U S C R I P T ACCEPTED MANUSCRIPTsification. For the problem at hand, the efficiency of various audio and visual features and fusion of them as well as various regression frameworks is examined. Experimental results suggest that there is much room for improvement in the design and development of automated multi-modal approaches to continuous conflict analysis. The CONFER Database is publicly available for non-commercial use at http://ibug.doc.ic.ac.uk/resources/confer/.

show abstract

On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common

Cited by 188 publications

References 32 publications

Comparing the acoustic expression of emotion in the speaking and the singing voice

Comparing the acoustic expression of emotion in the speaking and the singing voice

Emotion in the singing voice—a deeperlook at acoustic features in the light ofautomatic classification

The Conflict Escalation Resolution (CONFER) Database

Contact Info

Product

Resources

About