2023
DOI: 10.1111/coin.12607
|View full text |Cite
|
Sign up to set email alerts
|

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Chujie Xu,
Yong Du,
Jingzi Wang
et al.

Abstract: Emotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning‐based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross‐attention model is introduced to alleviate this issue. However, multi‐scale feature information on different modalit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 61 publications
0
0
0
Order By: Relevance
“…Considering the spatial localization of convolutional networks, researchers have proposed some methods to enhance the facial expression understanding of convolutional networks, such as the introduction of pyramid structure [25] to reduce the loss of effective information due to the deeper model or multi-scale convolutional mechanisms [26] to improve the ability to capture subtle changes in expressions. Some other researchers have used graph convolutional networks [10,11] to capture information about the expression features of dynamic sequences or to incorporate different attentional mechanisms depending on the task requirements, etc. Minaee et al [7] proposed an attention-based convolutional network focusing on different parts of the facial image to perform the FER task.…”
Section: Visible Light Facial Expression Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Considering the spatial localization of convolutional networks, researchers have proposed some methods to enhance the facial expression understanding of convolutional networks, such as the introduction of pyramid structure [25] to reduce the loss of effective information due to the deeper model or multi-scale convolutional mechanisms [26] to improve the ability to capture subtle changes in expressions. Some other researchers have used graph convolutional networks [10,11] to capture information about the expression features of dynamic sequences or to incorporate different attentional mechanisms depending on the task requirements, etc. Minaee et al [7] proposed an attention-based convolutional network focusing on different parts of the facial image to perform the FER task.…”
Section: Visible Light Facial Expression Recognitionmentioning
confidence: 99%
“…However, the spatial localization of convolutional networks makes it difficult for the model to learn the dependencies between different facial regions, thus limiting the understanding of global facial expressions. To overcome these problems, some improved models based on convolutional networks have also been proposed, such as the attention mechanism [7], graph convolutional networks [10,11], etc., which aim to enhance the model's ability to learn the dependencies between different facial regions, to better capture the global facial expression information and to improve the model's ability to recognize expressions. In recent years, the Transformer architecture [12] has achieved remarkable success in natural language processing tasks, and inspired by its successful application, ViT (Vision Transformer) [13] has been introduced into image classification tasks and achieved remarkable results through the non-local attention mechanism.…”
Section: Introductionmentioning
confidence: 99%