Proceedings of the 20th ACM International Conference on Multimodal Interaction 2018
DOI: 10.1145/3242969.3264990
|View full text |Cite
|
Sign up to set email alerts
|

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

Abstract: This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, scenes, skeletons and salient regions using visual attention mechanisms are fused to classify the emotion of a group of people in an image as positive, neutral or negative. Experimental results show that the proposed hybrid network achieves 78.98% and 68.08% clas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(34 citation statements)
references
References 33 publications
0
34
0
Order By: Relevance
“…Faces, scenes and skeletons are also analysed with CNNs in [82], where on the face-level the CNN output is fed to an LSTM, and where on the scene-level an attention mask is placed over the image. Attention is also applied in [2], where next to faces, scenes, and skeletons, visual attentions are included (salient regions important for emotion detection, found by neural attention mechanisms) by feeding 16 salient regions to a CNN and LSTM. In the work of [?]…”
Section: Hybrid Approachesmentioning
confidence: 99%
See 3 more Smart Citations
“…Faces, scenes and skeletons are also analysed with CNNs in [82], where on the face-level the CNN output is fed to an LSTM, and where on the scene-level an attention mask is placed over the image. Attention is also applied in [2], where next to faces, scenes, and skeletons, visual attentions are included (salient regions important for emotion detection, found by neural attention mechanisms) by feeding 16 salient regions to a CNN and LSTM. In the work of [?]…”
Section: Hybrid Approachesmentioning
confidence: 99%
“…Faces, scenes, and upper bodies [34], [35] Faces, scenes, and bodies/skeletons [80], [81], [82] Faces, scenes, skeletons, [2], [42] and visual attentions/objects Faces and objects [83] Faces, scenes, and places [24] and scene analysis), or fusion of individual emotions in a bottom-up approach.…”
Section: Aspects Description Studiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Holistic (scene-level) information is shown to be the important component in group-level classification in [10,12,24]. While analyzing the cohesiveness of a group of people, it is essential to understand the environments behind the people, e.g., students in a lecture tend to have a low cohesion level, while a group people standing and protesting at a plaza probably have high cohesiveness.…”
Section: Scene Featuresmentioning
confidence: 99%