2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461431
|View full text |Cite
|
Sign up to set email alerts
|

Effective Attention Mechanism in Dynamic Models for Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(25 citation statements)
references
References 9 publications
0
25
0
Order By: Relevance
“…As described in [7], many categorical SER-specific LSTMs essentially perform a sequence-tolabel task. In order to learn high-level representations, different pooling strategies are adopted for these recurrent models: finalpooling, mean-pooling or weighted-pooling LSTMs with attention mechanism added [5,4,2,17]. As result of these pooling operations, important information may be lost from successive frames [18,7].…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…As described in [7], many categorical SER-specific LSTMs essentially perform a sequence-tolabel task. In order to learn high-level representations, different pooling strategies are adopted for these recurrent models: finalpooling, mean-pooling or weighted-pooling LSTMs with attention mechanism added [5,4,2,17]. As result of these pooling operations, important information may be lost from successive frames [18,7].…”
Section: Speech Emotion Recognitionmentioning
confidence: 99%
“…Trigeorgis et al and Panagiotis et al [6,16] propose an end-to-end CNN-LSTM model to capture temporal dynamics in single utterance for emotion prediction. Several recent studies [17,5,15,18,19] explored the attention mechanism to focus on the emotion-salient frames in an utterance. However, these methods perform speech emotion recognition on single speech segment without considering the context information in the dialogue.…”
Section: Related Workmentioning
confidence: 99%
“…Most of previous studies perform speech emotion recognition on single speech segment. Among them, the CNN-LSTM network has achieved the state-of-the-art performance to predict the emotion of a single utterance [4,5,6]. However, emotion is not an instantaneous state.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, end-to-end training has dominated in SER due to its joint optimization of feature extractor and classifier. Additionally, intrinsic structures [4] [5] and efficient mechanism such as attention [6] [7] aim to refine emotional information in speech signal and produce more discriminative representations. From the perspective of loss function, however, fewer works are reported in SER despite there are successive state of the arts based on it in other domains [8] [9] [10].…”
Section: Introductionmentioning
confidence: 99%