A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Jin, Yue; Zheng, Tianqing; Gao, Chao; Xu, Guoqiang

doi:10.48550/arxiv.2107.04187

Cited by 8 publications

(13 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since, we have trained model for single task and not used any of audio and video features, so performance is not as good as teams using multi-task learning with video features. [27] 0.29 0.6491 0.4082 NTUA-CVSP [28] 0.3367 0.6418 0.4374 Morphoboid [29] 0.3511 0.668 0.4556 FLAB2021 [30] 0.4079 0.6729 0.4953 STAR [31] 0.4759 0.7321 0.5604 Maybe Next Time [32] 0.6046 0.7289 0.6456 CPIC-DIR2021 [33] 0.6834 0.7709 0.7123 Netease Fuxi Virtual Human [34] 0.763 0.8059 0.7777 Ours [18] 0.361 0.675 0.4646 Table 3 shows the influence of number of networks that are collaboratively trained in CCT. It can be observed that model with 3 networks performs the best in the presence of noise.…”

Section: Performance Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training

Gera,

Balasubramanian

2021

Preprint

View full text Add to dashboard Cite

Facial expression recognition (FER) in the wild is crucial for building reliable human-computer interactive systems. However, annotations of large scale datasets in FER has been a key challenge as these datasets suffer from noise due to various factors like crowd sourcing, subjectivity of annotators, poor quality of images, automatic labelling based on key word search etc. Such noisy annotations impede the performance of FER due to the memorization ability of deep networks. During early learning stage, deep networks fit on clean data. Then, eventually, they start overfitting on noisy labels due to their memorization ability, which limits FER performance. This report presents Consensual Collaborative Training (CCT) framework used in our submission to expression recognition track of the Affective Behaviour Analysis in-the-wild (ABAW) 2021 competition. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss, without making any assumption about the noise distribution. A dynamic transition mechanism is used to move from supervision loss in early learning to consistency loss for consensus of predictions among networks in the later stage. Co-training reduces overall error, and consistency loss prevents overfitting to noisy samples. The performance of the model is validated on challenging Aff-Wild2 dataset for categorical expression classification. Our code is made publicly available 1 .

show abstract

Section: Performance Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training

Gera,

Balasubramanian

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The third ABAW Competition, to be held in conjunction with the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022 is a continuation of the first [24] and second [32] ABAW Competitions held in conjunction with the IEEE Conference on Face and Gesture Recognition (IEEE FG) 2021 and with the International Conference on Computer Vision (ICCV) 2022, respectively, which targeted dimensional (in terms of valence and arousal) [2][3][4]8,9,11,21,35,39,47,48,50,[54][55][56], categorical (in terms of the basic expressions) [12,15,16,33,36,37,51] and facial action unit analysis and recognition [7,19,20,25,26,40,44,47]. The third ABAW Competition contains four Challenges, which are based on the same in-the-wild database, (i) the uni-task Valence-Arousal Estimation Challenge; (ii) the uni-task Expression Classification Challenge (for the 6 basic expressions plus the neutral state plus the 'other' category that denotes expressions/affective states other than the 6 basic ones); (iii) the uni-task Action Unit Detection Challenge (for 12 action units); (iv) the Multi-Task Learning Challenge (for joint learning and predicting of valence-arousal, 8 expressions -6 basic plus neutral plus 'other'-and 12 action units).…”

Section: Introductionmentioning

confidence: 99%

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Multi-Task Learning Challenges

Kollias¹

2022

Preprint

View full text Add to dashboard Cite

This paper describes the third Affective Behavior Analysis in-the-wild (ABAW) Competition, held in conjunction with IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. The 3rd ABAW Competition is a continuation of the Competitions held at ICCV 2021, IEEE FG 2020 and IEEE CVPR 2017 Conferences, and aims at automatically analyzing affect. This year the Competition encompasses four Challenges: i) uni-taskValence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Multi-Task-Learning. All the Challenges are based on a common benchmark database, Aff-Wild2, which is a large scale in-the-wild database and the first one to be annotated in terms of valence-arousal, expressions and action units. In this paper, we present the four Challenges, with the utilized Competition corpora, we outline the evaluation metrics and present the baseline systems along with their obtained results.

show abstract

“…Many facial analysis tasks, such as face recognition, age and gender prediction, have reached high accuracy appropriate for many practical applications [1,18]. However, but the ability to understand human emotions is still far from maturity [6]. The personal bias and backgrounds increase the uncertainty of emotion perception and contextual information [3].…”

Section: Introductionmentioning

confidence: 99%

“…Though FER has been a topic of major research [13], many models learn too many features specific for a concrete dataset, which is not practical for in-the-wild settings [6]. The development of in-the-wild affect prediction engines has been accelerated by a couple of ABAW (Affective Behavior Analysis in-the-wild) competitions [8,14].…”

Section: Introductionmentioning

confidence: 99%

“…The third place in the first and second tasks was achieved by the authors of the paper [21] who proposed the multitask learning (MTL) technique for the incomplete labels of these correlated tasks. The multi-modal audiovisual ensemble model [6] took the second place, while the winner of these two sub-challenges was a multi-task streaming network [24]. The latter captures identity-invariant emotional features using an advanced facial embedding.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Savchenko¹

2022

Preprint

View full text Add to dashboard Cite

In this paper, we consider the problem of real-time videobased facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on Affect-Net. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.

show abstract

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Cited by 8 publications

References 16 publications

Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training

Affect Expression Behaviour Analysis in the Wild using Consensual Collaborative Training

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Multi-Task Learning Challenges

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Contact Info

Product

Resources

About