Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-917
|View full text |Cite
|
Sign up to set email alerts
|

Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

Abstract: Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experiment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

18
158
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 193 publications
(176 citation statements)
references
References 34 publications
18
158
0
Order By: Relevance
“…Figure 2 shows the confusion matrices of the proposed systems. In general, most of the emotion labels are frequently misclassified as neutral class, supporting the claims of [12,27]. The model confused between the excite and happy class since there exists a report of overlap in distinguishing these two classes even human evaluations [13].…”
Section: Performance Evaluationsupporting
confidence: 58%
“…Figure 2 shows the confusion matrices of the proposed systems. In general, most of the emotion labels are frequently misclassified as neutral class, supporting the claims of [12,27]. The model confused between the excite and happy class since there exists a report of overlap in distinguishing these two classes even human evaluations [13].…”
Section: Performance Evaluationsupporting
confidence: 58%
“…To show the effectiveness of the proposed method, we compare our method with currently advanced approaches through the five-folder cross validation. Compared with our proposed method, these approaches [31,32] also utilized mel-scale spectrograms as inputs, and showed promising results for speech emotion recognition. Neumann et al [31] proposed an attentive CNN with multi-view learning objective function for speech emotion recognition.…”
Section: Comparison To Other Advanced Approachesmentioning
confidence: 99%
“…Motivated by the success of deep learning techniques in various application domains, such as large scale image and speech recognition [4,5], several Deep Neural Network (DNN) or Convolutional Neural Network (CNN) based SER methods have recently been proposed [6,7,8,9,10,11,12]. In [6,7], a multistage procedure was applied, in which the DNN and CNN network were trained for frontend feature extraction, followed by a backend emotion recognizer such as SVM and Extreme Learning Machine (ELM).…”
Section: Introductionmentioning
confidence: 99%
“…Neumann el. al [12] further introduced an attention mechanism after the max-pooling operation. while Mirsamadi et.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation