Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1610
|View full text |Cite
|
Sign up to set email alerts
|

Attention-based Sequence Classification for Affect Detection

Abstract: This paper presents the Cogito submission to the Interspeech Computational Paralinguistics Challenge (ComParE), for the second sub-challenge. The aim of this second sub-challenge is to recognize self-assessed affect from short clips of speechcontaining audio data. We adopt a sequence classification-based approach where we use a long-short term memory (LSTM) network for modeling the evolution of low-level spectral coefficients, with added attention mechanism to emphasize salient regions of the audio clip. Addit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 11 publications
(13 reference statements)
0
8
0
Order By: Relevance
“…Recurrent Stage: Gated recurrent units (GRU) and long short-term memory units (LSTM) [54] are the two most common recurrent types in paralinguistics [9], [18], [19], [55], [56]. Unidirectional [9], [18] as well as bidirectional [19], [56] networks are popular.…”
Section: Hyperparameter Search Spacementioning
confidence: 99%
See 2 more Smart Citations
“…Recurrent Stage: Gated recurrent units (GRU) and long short-term memory units (LSTM) [54] are the two most common recurrent types in paralinguistics [9], [18], [19], [55], [56]. Unidirectional [9], [18] as well as bidirectional [19], [56] networks are popular.…”
Section: Hyperparameter Search Spacementioning
confidence: 99%
“…However, as these implementations do not allow to alter the activation function nor implement recurrent batch normalization [57] we fixed the corresponding parameter ranges to the implementations preset values. When using the recurrent stage as the first one in the network, we scanned for unit numbers of up to 128 as RNNs commonly are shallower and wider than CNNs [18], [55], [58].…”
Section: Hyperparameter Search Spacementioning
confidence: 99%
See 1 more Smart Citation
“…When combined with a self attention mechanism, emotionally informative time-segments of an input can be highlighted [60]. Mirsamadi et al used an attention RNN for SER on Interactive Emotional Dyadic Motion Capture (IEMOCAP) while Gorrostieta et al applied a similar model with low-level spectral features as input for the ComParE self-asessed affect [61] sub-challenge [62]. More recently, combining CNN feature extractors with attention based RNNs has been shown to be a highly competitive approach to SER [63], [64].…”
Section: Deep Learning Based Sermentioning
confidence: 99%
“…In addition to learning useful spatio-temporal features, it is also important to select the emotionally salient sections of an input signal to improve SER performance further [11]. The use of attention mechanisms in RNN and CNN-based models has frequently been demonstrated as a useful tool to encourage a model to more heavily weight specific regions of an input sequence or image [12]. Attention mechanisms have also been effectively applied in SER [11], [13]- [15].…”
Section: Introductionmentioning
confidence: 99%