2018 IEEE Spoken Language Technology Workshop (SLT) 2018
DOI: 10.1109/slt.2018.8639633
|View full text |Cite
|
Sign up to set email alerts
|

Context-Aware Attention Mechanism for Speech Emotion Recognition

Abstract: In this work, we study the use of attention mechanisms to enhance the performance of the state-of-the-art deep learning model in Speech Emotion Recognition (SER). We introduce a new Long Short-Term Memory (LSTM)-based neural network attention model which is able to take into account the temporal information in speech during the computation of the attention vector. The proposed LSTM-based model is evaluated on the IEMOCAP dataset using a 5-fold cross-validation scheme and achieved 68.8% weighted accuracy on 4 c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
36
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 42 publications
(43 citation statements)
references
References 12 publications
4
36
0
Order By: Relevance
“…We investigate the proposed method using a simple LSTM model and a small-size Transformer model on the IEMOCAP dataset (Busso et al, 2008), composed of five acted sessions, for a fourclass emotions classification and we compare to the state of the art (Mirsamadi et al, 2017) model, a local attention based BiLSTM. Ramet et al (2018) showed in their work a new model that is competitive to the one previously cited, following a cross-valiadation evaluation schema. For a fair comparison, in this paper we focus on a non-crossvaliation schema and thus compare our results to the work of Mirsamadi et al (2017), where a similar schema is followed using as evaluation set the fifth session of IEMOCAP database.…”
Section: Related Workmentioning
confidence: 97%
See 1 more Smart Citation
“…We investigate the proposed method using a simple LSTM model and a small-size Transformer model on the IEMOCAP dataset (Busso et al, 2008), composed of five acted sessions, for a fourclass emotions classification and we compare to the state of the art (Mirsamadi et al, 2017) model, a local attention based BiLSTM. Ramet et al (2018) showed in their work a new model that is competitive to the one previously cited, following a cross-valiadation evaluation schema. For a fair comparison, in this paper we focus on a non-crossvaliation schema and thus compare our results to the work of Mirsamadi et al (2017), where a similar schema is followed using as evaluation set the fifth session of IEMOCAP database.…”
Section: Related Workmentioning
confidence: 97%
“…OpenSMILE (Eyben et al, 2013) is used for extracting the features. We opt for the IS09 features set (Schuller et al, 2009) as proposed by Ramet et al (2018) and commonly used for SER.…”
Section: Toi In Speech Emotion Recognitionmentioning
confidence: 99%
“…The relatively small amount of training data in our case, only 5.5 hours of speech, could lead to a partial learning of the input representation. As for the engineered features, we evaluated our methodology on the IS09 [17] features set (384 features) because it is a common set used for SER tasks and it has been used by [5] to get the latest state of the art results. Even if it is not been used as extensively as IS09, we extracted also the eGeMaps set [18]: this set showed to be a good substitute of IS09 in several works, such as [19], [20] and [21].…”
Section: Input Featuresmentioning
confidence: 99%
“…IEMOCAP database [23] was chosen for our experiments since it has been established in the literature on the SER field as a benchmark. Moreover, it contains high frequency recording audio data (16kHz sample rate), both genders, 9 emotions and improvised and scripted speech, that the literature showed to have different complexity when making inference [4], [5]. Out of the 9 emotions we focused on four of them (angry, happy, neutral and sad) in order to have comparable results with previous research.…”
Section: Databasementioning
confidence: 99%
See 1 more Smart Citation