2012
DOI: 10.1007/s10772-012-9127-7
|View full text |Cite
|
Sign up to set email alerts
|

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Abstract: In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisaki's model parameters, voice quality, jitter, and shimmer. Selected features are fed as inpu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
21
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 67 publications
(22 citation statements)
references
References 49 publications
1
21
0
Order By: Relevance
“…The best results were achieved for the emotions of sadness and joy, the worst result was received for the emotion of anger (see values in Tables 17 and 18). It is not entirely consistent with the results obtained from other authors using the EMO-DB database for GMM emotion recognition [37][38][39] as well as those published in more complex comparison studies [40,41]. Usually, the best recognized emotions are anger and sadness followed by neutral state, the emotion joy generates the most confusion being recognized as anger [39].…”
Section: Discussion Of Resultssupporting
confidence: 60%
“…The best results were achieved for the emotions of sadness and joy, the worst result was received for the emotion of anger (see values in Tables 17 and 18). It is not entirely consistent with the results obtained from other authors using the EMO-DB database for GMM emotion recognition [37][38][39] as well as those published in more complex comparison studies [40,41]. Usually, the best recognized emotions are anger and sadness followed by neutral state, the emotion joy generates the most confusion being recognized as anger [39].…”
Section: Discussion Of Resultssupporting
confidence: 60%
“…A model with human-selected feature extraction (HSF) using the same data split and softmax configuration was trained on several widely used manufactured features including fundamental frequency [19], pitch related features [20], energy related features [21], zero crossing rate (ZCR) [21, 22], jitter [21], shimmer [21], and Mel-frequency cepstral coefficients (MFCC) [22–24]. As suggested in [19, 20], we applied the statistical functions including Maximum, Minimum, Range, Mean, Slope, Offset, Stddev, Skewness, Kurtosis, Variance, and Median for these features.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…In fact, even CNN A alone outperformed HSF, further demonstrating the effectiveness of ConvNet-based feature selection. Although one could fine-tune the manually-selected features [21, 22], doing so would be highly laborious compared to automated ConvNet learning.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…Possible applications include a callcentre environment, where such an emotion recognition schema can be used to improve the quality of service. Furthermore, by discriminating negative from non-negative emotions, human-computer interaction designers will be able to recognize which parts of the interface are problematic, in the sense that they evoke negative emotions [22]. With respect to the audio, this is extracted from the audio-visual clips as monochannel wav files of a 48kHz sampling rate.…”
Section: Databasementioning
confidence: 99%