2020
DOI: 10.1109/access.2020.3036877
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Attention Network for Continuous-Time Emotion Recognition Using Video and EEG Signals

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 50 publications
0
7
0
Order By: Relevance
“…Fusing multimodal features can achieve a better emotion recognition effect. For example, Choi et al [20] proposed a multimodal fusion network integrating video modality and EEG modality. To calculate the attention weights of facial video features and corresponding EEG features, they described a bilinear pooled multimodal attention network based on low-rank decomposition.…”
Section: Related Workmentioning
confidence: 99%
“…Fusing multimodal features can achieve a better emotion recognition effect. For example, Choi et al [20] proposed a multimodal fusion network integrating video modality and EEG modality. To calculate the attention weights of facial video features and corresponding EEG features, they described a bilinear pooled multimodal attention network based on low-rank decomposition.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these works use images, audio and/or text as inputs [2-5, 20, 21]. In a few cases, physiological signals have been used to improve recognition from image, audio and text [22][23][24]. A few authors have described the use of multiple physiological signal modalities [6][7][8].…”
Section: Related Workmentioning
confidence: 99%
“…DL is exploited as a extractor of feature on feature map and later feature extracted is combined for the classification method to identify dissimilar types of emotion. Choi et al [13] introduced a multi-modal fusion network that incorporates EEG and video modality networks. To compute attention weight of the corresponding EEG and facial video features, a multimodal attention network using bi-linear pooling related to low-rank decomposition, is developed.…”
Section: Related Workmentioning
confidence: 99%