Disentangling Identity and Pose for Facial Expression Recognition

Jiang, Jing; Deng, Weihong

doi:10.1109/taffc.2022.3197761

Cited by 19 publications

(8 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparison on FERPlus dataset is shown in Table II. It can been seen that our FER-former achieved best FER performance compared to other methods, including FER with unconstrained variations (RAN [11], IPD-FER [58]), and FER…”

Section: A Comparison With State-of-the-art Methodsmentioning

confidence: 92%

“…For RAF-DB and FERPlus datasets, the pre-trained IR-50 [51] on Ms-Celeb-1M [52] is adopted as a feature extractor, which is consistent with TranFER [23]. For SFEW 2.0 dataset, we pre-train FER-former on RAF-DB dataset and then finetune it on SFEW 2.0 dataset, which is consistent with IPD-FER [58]. Regarding our scratch-trained Transformer encoder, the depth is 16, the embedding dimension is 256, the number of heads is 4, and the mlp ratio is 4.…”

Section: Methodsmentioning

confidence: 93%

“…RAF-DB dataset is one of the most widely used large-scale realworld FER datasets because it facilitates fair comparisons, in which all images are cropped and do not require any additional preprocessing. Results show that our FaceFormer achieves state-of-the-art performance compared to all other methods, including FER with unconstrained variations (RAN [11], MA-Net [13], IPD-FER [58]), and FER with annotation ambiguity (SCN [30], DMUE [28], KTN [59], EfficientFace [26], SPLDL [29], EASE [32]). In particular, when compared to TransFER [23], the previous best achieved by combining CNN and ViT, FER-former lowers the error rate from 9.09% to 8.7%, a 4.3% improvement.…”

Section: A Comparison With State-of-the-art Methodsmentioning

confidence: 98%

“…RAN [11] 2020 86.90 SCN [30] 2020 88.14 DLN [60] 2021 86.40 KTN [59] 2021 88.07 MA-Net [13] 2021 88.40 DMUE [28] 2021 89.42 EfficientFace [26] 2021 88.36 TransFER [23] 2021 90.91 IPD-FER [58] 2022 88.89 CRS-CONT [61] 2022 88.07 SPLDL [29] 2022 89.08 EASE [32] 2022 89.56 FER-former (Ours) 2023 91.30…”

Section: Methods Years Acc(%)mentioning

confidence: 99%

See 3 more Smart Citations

FER-former: Multi-modal Transformer for Facial Expression Recognition

Li¹,

Wang²,

Gong³

et al. 2023

Preprint

View full text Add to dashboard Cite

The ever-increasing demands for intuitive interactions in Virtual Reality has triggered a boom in the realm of Facial Expression Recognition (FER). To address the limitations in existing approaches (e.g., narrow receptive fields and homogenous supervisory signals) and further cement the capacity of FER tools, a novel multifarious supervision-steering Transformer for FER in the wild is proposed in this paper. Referred as FERformer, our approach features multi-granularity embedding integration, hybrid self-attention scheme, and heterogeneous domainsteering supervision. In specific, to dig deep into the merits of the combination of features provided by prevailing CNNs and Transformers, a hybrid stem is designed to cascade two types of learning paradigms simultaneously. Wherein, a FER-specific transformer mechanism is devised to characterize conventional hard one-hot label-focusing and CLIP-based text-oriented tokens in parallel for final classification. To ease the issue of annotation ambiguity, a heterogeneous domains-steering supervision module is proposed to make image features also have text-space semantic correlations by supervising the similarity between image features and text features. On top of the collaboration of multifarious token heads, diverse global receptive fields with multi-modal semantic cues are captured, thereby delivering superb learning capability. Extensive experiments on popular benchmarks demonstrate the superiority of the proposed FER-former over the existing state-of-the-arts.

show abstract

Section: A Comparison With State-of-the-art Methodsmentioning

confidence: 92%

Section: Methodsmentioning

confidence: 93%

Section: A Comparison With State-of-the-art Methodsmentioning

confidence: 98%

Section: Methods Years Acc(%)mentioning

confidence: 99%

See 2 more Smart Citations

FER-former: Multi-modal Transformer for Facial Expression Recognition

Li¹,

Wang²,

Gong³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Ruan et al [ 6 ] learned intraclass features and interclass features by decomposing and reconstructing. Jiang et al [ 30 ] proposed an identity and pose disentangled method, which separates expression features from the identity and pose.…”

Section: Related Workmentioning

confidence: 99%

Facial Expression Recognition Using Local Sliding Window Attention

Qiu

Zhao

et al. 2023

Sensors

View full text Add to dashboard Cite

There are problems associated with facial expression recognition (FER), such as facial occlusion and head pose variations. These two problems lead to incomplete facial information in images, making feature extraction extremely difficult. Most current methods use prior knowledge or fixed-size patches to perform local cropping, thereby enhancing the ability to acquire fine-grained features. However, the former requires extra data processing work and is prone to errors; the latter destroys the integrity of local features. In this paper, we propose a local Sliding Window Attention Network (SWA-Net) for FER. Specifically, we propose a sliding window strategy for feature-level cropping, which preserves the integrity of local features and does not require complex preprocessing. Moreover, the local feature enhancement module mines fine-grained features with intraclass semantics through a multiscale depth network. The adaptive local feature selection module is introduced to prompt the model to find more essential local features. Extensive experiments demonstrate that our SWA-Net model achieves a comparable performance to that of state-of-the-art methods with scores of 90.03% on RAF-DB, 89.22% on FERPlus, 63.97% on AffectNet.

show abstract