2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.01424
|View full text |Cite
|
Sign up to set email alerts
|

EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
70
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 121 publications
(73 citation statements)
references
References 40 publications
3
70
0
Order By: Relevance
“…To counter that effect, we employ both localization and semantic information to evaluate the effect of each object on the primary agent. Mittal et al [15] presented the EmotiCon based on Frege's Context Principle from psychology. Their method included three streams: multiple modalities of faces and gaits, background context and socio-dynamic inter-agent interactions.…”
Section: A Emotion Recognitionmentioning
confidence: 99%
See 3 more Smart Citations
“…To counter that effect, we employ both localization and semantic information to evaluate the effect of each object on the primary agent. Mittal et al [15] presented the EmotiCon based on Frege's Context Principle from psychology. Their method included three streams: multiple modalities of faces and gaits, background context and socio-dynamic inter-agent interactions.…”
Section: A Emotion Recognitionmentioning
confidence: 99%
“…Those intermediate features are concatenated and denoted by X m . In the same spirit with [13], [15], in the scene-level stream, we conceal the primary agent by a black box with the purpose of assisting the model to learn only the background information. The features produced from the scene-level stream are denoted by X s .…”
Section: A Algorithm Overviewmentioning
confidence: 99%
See 2 more Smart Citations
“…While human arXiv:2012.09402v1 [cs.CV] 17 Dec 2020 perception typically involves inferring the physical attributes about the humans (detection [5,35,43,50], poses [3,4,8,25,28,41], shape [13,20,29,30], gaze [44] etc. ), interpreting humans involves reasoning about the finer details relating to human activity [6,24,27,48,49], behaviour [26,34], human-object visual relationship detection [23,33,36,37,39,40], and human-object interactions [23,32,33,36,37,39,40,42]. In this work, we investigate the problem of identifying Human-Object Interactions in videos.…”
Section: Introductionmentioning
confidence: 99%