2019 International Conference on Multimodal Interaction 2019
DOI: 10.1145/3340555.3355720
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Attention Fusion Network for Video-based Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 17 publications
1
9
0
Order By: Relevance
“…As for "Surprise" and "Disgust", the worse performance might be due to a potential mixing of different emotions, making these emotion categories not easy to be correctly classified. We also observe that the proportion of these two emotions is the lowest in the training set, and similar results are also found in [22], [29], [30], [77]. Finally, the proposed methods are further evaluated on the IEMOCAP database.…”
Section: E Overall Comparisonsupporting
confidence: 72%
See 3 more Smart Citations
“…As for "Surprise" and "Disgust", the worse performance might be due to a potential mixing of different emotions, making these emotion categories not easy to be correctly classified. We also observe that the proportion of these two emotions is the lowest in the training set, and similar results are also found in [22], [29], [30], [77]. Finally, the proposed methods are further evaluated on the IEMOCAP database.…”
Section: E Overall Comparisonsupporting
confidence: 72%
“…To improve emotion recognition performances, the mouth area was further divided into several subregions, as elaborated in [53], to extract LBP-TOP features from each subregion and concatenate the respective features. In [30] a multiple attention fusion network (MAFN) was proposed by modeling human emotion recognition mechanisms.…”
Section: Audio-visual Based Emotion Recognitionmentioning
confidence: 99%
See 2 more Smart Citations
“…Secondly, the relationship features of different layers are fully utilized by bidirectional RNN with self-attention. Wang et al [21] defined a multimodal domain adaptive method to obtain the interaction between modes. e performance of emotion recognition is evaluated by using different architectures CNN and different CNN feature layers in paper [11].…”
Section: Recognizing Emotion From Videosmentioning
confidence: 99%