The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech

Schwenker, Friedhelm; Scherer, Stefan; Magdi, Yasmine M.; Palm, Günther

doi:10.1007/978-3-642-04274-4_92

Cited by 19 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analysis of human emotions and processing recorded data, for instance the speech, facial expressions, hand gestures, body movements, etc. is a multidisciplinary field that has been emerging as a rich area of research in recent times [5,11,20,24,21,27]. In this paper multiple classifier systems for the classification of audio-visual features have been investigated, the numerical evaluation of the proposed emotion recognition systems has been carried out on the data sets of the AVEC challenge [23].…”

Section: Introductionmentioning

confidence: 99%

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Glodek

Tschechne

Layher

et al. 2011

Affective Computing and Intelligent Interaction

Self Cite

View full text Add to dashboard Cite

Abstract. Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPCand MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

show abstract

Section: Introductionmentioning

confidence: 99%

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Glodek

Tschechne

Layher

et al. 2011

Affective Computing and Intelligent Interaction

Self Cite

View full text Add to dashboard Cite

show abstract

“…Considering that the human perception rate for the Emo-DB was set to 84% [ 43 ], this mean value of 82.45% can be seen as a promising result. Moreover, this score outperforms the results of other works in the literature over the Emo-DB, like the scores obtained in [ 43 , 74 ], which reached accuracies of 79% and 77%, respectively, although these works analyzed the whole database and used different machine learning algorithms and audio features. The overall results demonstrate the good performance of the CSS stacking classification paradigm and confirms the robustness of this classification system to deal with the emotion recognition in speech over several conditions and datasets.…”

Section: Resultsmentioning

confidence: 50%

Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

Álvarez

Sierra

Arruti

et al. 2015

Sensors

View full text Add to dashboard Cite

In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.

show abstract

“…Six class-specific π -ESNs were trained independently over the sequences of the corresponding class. The π -ESNs were initialized as follows 4 . Fed by the 21 input units, the reservoir consisted of 100 state neurons (with transfer function tanh).…”

Section: Model Selection and Comparison With Static Classifiersmentioning

confidence: 99%

“…Vlasenko et al [1] apply Gaussian mixture models (GMM) and hidden Markov models (HMM) defined at both the frame-and turn-level representations of the audio signals, while Wagner et al [2] thoroughly analyze the behavior of HMMs and support vector machines (SVM) using Mel-cepstra [3] and energy-based features. Schwenker et al [4] investigate the use of the SVM-GMM Supervector approach relying on PLP and ModSpec features [5]. Dellaert et al [6] classify speech signals into 4 broad classes of emotions by applying a mixture of k-nearest neighbor [7] experts (with k = 11) estimated on different subsets of ✩ This paper has been recommended for acceptance by Sanniti di Baja.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Emotion recognition from speech signals via a probabilistic echo-state network

Trentin

Scherer

Schwenker

2015

Pattern Recognition Letters

Self Cite

View full text Add to dashboard Cite

The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech

Cited by 19 publications

References 16 publications

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

Emotion recognition from speech signals via a probabilistic echo-state network

Contact Info

Product

Resources

About