2021
DOI: 10.1609/aaai.v35i4.16465
|View full text |Cite
|
Sign up to set email alerts
|

Robust Lightweight Facial Expression Recognition Network with Label Distribution Training

Abstract: This paper presents an efficiently robust facial expression recognition (FER) network, named EfficientFace, which holds much fewer parameters but more robust to the FER in the wild. Firstly, to improve the robustness of the lightweight network, a local-feature extractor and a channel-spatial modulator are designed, in which the depthwise convolution is employed. As a result, the network is aware of local and global-salient facial features. Then, considering the fact that most emotions occur as combinations, mi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
27
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 134 publications
(38 citation statements)
references
References 51 publications
(65 reference statements)
0
27
0
Order By: Relevance
“…RAF-DB dataset is one of the most widely used large-scale realworld FER datasets because it facilitates fair comparisons, in which all images are cropped and do not require any additional preprocessing. Results show that our FaceFormer achieves state-of-the-art performance compared to all other methods, including FER with unconstrained variations (RAN [11], MA-Net [13], IPD-FER [58]), and FER with annotation ambiguity (SCN [30], DMUE [28], KTN [59], EfficientFace [26], SPLDL [29], EASE [32]). In particular, when compared to TransFER [23], the previous best achieved by combining CNN and ViT, FER-former lowers the error rate from 9.09% to 8.7%, a 4.3% improvement.…”
Section: A Comparison With State-of-the-art Methodsmentioning
confidence: 98%
See 2 more Smart Citations
“…RAF-DB dataset is one of the most widely used large-scale realworld FER datasets because it facilitates fair comparisons, in which all images are cropped and do not require any additional preprocessing. Results show that our FaceFormer achieves state-of-the-art performance compared to all other methods, including FER with unconstrained variations (RAN [11], MA-Net [13], IPD-FER [58]), and FER with annotation ambiguity (SCN [30], DMUE [28], KTN [59], EfficientFace [26], SPLDL [29], EASE [32]). In particular, when compared to TransFER [23], the previous best achieved by combining CNN and ViT, FER-former lowers the error rate from 9.09% to 8.7%, a 4.3% improvement.…”
Section: A Comparison With State-of-the-art Methodsmentioning
confidence: 98%
“…Label distribution learning is an intuitive and favoured scheme to reduce the ambiguity. Zhao et al [26] treat the output of an auxiliary ResNet-50 as probability distribution to guide the learning of the other backbone network. Shao et al [29] further adopt an auxiliary network as a label distribution generator to generate label distributions for guiding the backbone network training and selecting easy samples.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Savchenko et al [38] first verified the effectiveness of CNNs such as Mo-bileNet [19], EfficientNet [41] and RexNet [15] for FER. Zhao et al proposed an efficient and robust FER network EfficientFace [57] for the analysis of facial expressions in the wild. Nevertheless, convolution-based FER algorithms cannot consider the global information of the image due to the limitation of convolutional local receptive field.…”
Section: Introductionmentioning
confidence: 99%
“…Facial expressions are one of the prominent ways to correctly infer an individual's mood [33] and thus can fulfill the above vision if the smartphone can monitor the temporal changes of its user's facial expressions. Interestingly, there have been decades of research on inferring facial expressions from video or image-based data [3,34,36,51]; however, these works are not suitable to fulfill the above vision of developing a pervasive smartphone application because of the following reasons. Firstly, image and video processing is computationally heavy and consumes a significant amount of system resources and energy.…”
Section: Introductionmentioning
confidence: 99%