Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition

Vogt, Thurid; André, Elisabeth

doi:10.1109/icme.2005.1521463

Cited by 176 publications

(125 citation statements)

References 4 publications

Supporting

Mentioning

121

Contrasting

Unclassified

Order By: Relevance

“…Literature on speech (see for example Banse and Scherer [15]) shows that the majority of studies have been conducted with emotional acted speech. Feature sets for acted and spontaneous speech have recently been compared in [16]. Generally, few acted-emotion speech databases have included speakers with several different native languages.…”

Section: Related Workmentioning

confidence: 99%

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

Kessous¹,

Castellano

2009

J Multimodal User Interfaces

205

View full text Add to dashboard Cite

In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system. Bimodal emotion recognition based on all combinations of the modalities (i.e., 'face-gesture', 'facespeech' and 'gesture-speech') was also investigated. The results show that the best pairing is 'gesture-speech'. Using all three modalities resulted in a 3.3% classification improvement over the best bimodal results.

show abstract

Section: Related Workmentioning

confidence: 99%

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

Kessous¹,

Castellano

2009

J Multimodal User Interfaces

205

View full text Add to dashboard Cite

show abstract

“…It aligns well with the fact that elicited corpora have mostly neutral or assets that can be considered not emotive enough [25], [36], [38], but at the same time this suggests a need for more sophisticated ways of performing the discretisation.…”

Section: Resultsmentioning

confidence: 87%

“…It has been stated that prosodic features are especially useful in the case of acted speech, but spectral features are valuable when recognising emotion from natural or elicited speech [38]. Some researchers also state the importance of lexical and contextual features [11], but their extraction usually requires the additional effort of annotating the speech corpus.…”

Section: Classifying Emotion In Speechmentioning

confidence: 99%

Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study

Тарасов¹,

Delany²

2011

Face and Gesture 2011

View full text Add to dashboard Cite

Abstract-A significant amount of the research on automatic emotion recognition from speech focuses on acted speech that is produced by professional actors. This approach often leads to overoptimistic results as the recognition of emotion in real-life conditions is more challenging due the propensity of mixed and less intense emotions in natural speech. The paper presents an empirical study of the most widely used classifiers in the domain of emotion recognition from speech, across multiple non-acted emotional speech corpora. The results indicate that Support Vector Machines have the best performance and that they along with Multi-Layer Perceptron networks and k-nearest neighbour classifiers perform significantly better (using the appropriate statistical tests) than decision trees, Naïve Bayes classifiers and Radial Basis Function networks.

show abstract

“…Such features include, but are not limited to, speech and its content, prosodic and paralinguistic features, eye gaze, facial expressions, body movements, or more advanced interpretations of such features such as the affective state, personality, mood or intentions of the user (e.g. [8,14,15]). …”

Section: Analysis Of Natural Interactionsmentioning

confidence: 99%

From multimodal analysis to real-time interactions with virtual agents

Poppe

Böck

Bonin

et al. 2014

J Multimodal User Interfaces

View full text Add to dashboard Cite

Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition

Cited by 176 publications

References 4 publications

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis

Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study

From multimodal analysis to real-time interactions with virtual agents

Contact Info

Product

Resources

About