2021
DOI: 10.48550/arxiv.2107.04187
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Abstract: Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract ass… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…Since, we have trained model for single task and not used any of audio and video features, so performance is not as good as teams using multi-task learning with video features. [27] 0.29 0.6491 0.4082 NTUA-CVSP [28] 0.3367 0.6418 0.4374 Morphoboid [29] 0.3511 0.668 0.4556 FLAB2021 [30] 0.4079 0.6729 0.4953 STAR [31] 0.4759 0.7321 0.5604 Maybe Next Time [32] 0.6046 0.7289 0.6456 CPIC-DIR2021 [33] 0.6834 0.7709 0.7123 Netease Fuxi Virtual Human [34] 0.763 0.8059 0.7777 Ours [18] 0.361 0.675 0.4646 Table 3 shows the influence of number of networks that are collaboratively trained in CCT. It can be observed that model with 3 networks performs the best in the presence of noise.…”
Section: Performance Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Since, we have trained model for single task and not used any of audio and video features, so performance is not as good as teams using multi-task learning with video features. [27] 0.29 0.6491 0.4082 NTUA-CVSP [28] 0.3367 0.6418 0.4374 Morphoboid [29] 0.3511 0.668 0.4556 FLAB2021 [30] 0.4079 0.6729 0.4953 STAR [31] 0.4759 0.7321 0.5604 Maybe Next Time [32] 0.6046 0.7289 0.6456 CPIC-DIR2021 [33] 0.6834 0.7709 0.7123 Netease Fuxi Virtual Human [34] 0.763 0.8059 0.7777 Ours [18] 0.361 0.675 0.4646 Table 3 shows the influence of number of networks that are collaboratively trained in CCT. It can be observed that model with 3 networks performs the best in the presence of noise.…”
Section: Performance Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…The third ABAW Competition, to be held in conjunction with the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022 is a continuation of the first [24] and second [32] ABAW Competitions held in conjunction with the IEEE Conference on Face and Gesture Recognition (IEEE FG) 2021 and with the International Conference on Computer Vision (ICCV) 2022, respectively, which targeted dimensional (in terms of valence and arousal) [2][3][4]8,9,11,21,35,39,47,48,50,[54][55][56], categorical (in terms of the basic expressions) [12,15,16,33,36,37,51] and facial action unit analysis and recognition [7,19,20,25,26,40,44,47]. The third ABAW Competition contains four Challenges, which are based on the same in-the-wild database, (i) the uni-task Valence-Arousal Estimation Challenge; (ii) the uni-task Expression Classification Challenge (for the 6 basic expressions plus the neutral state plus the 'other' category that denotes expressions/affective states other than the 6 basic ones); (iii) the uni-task Action Unit Detection Challenge (for 12 action units); (iv) the Multi-Task Learning Challenge (for joint learning and predicting of valence-arousal, 8 expressions -6 basic plus neutral plus 'other'-and 12 action units).…”
Section: Introductionmentioning
confidence: 99%
“…Many facial analysis tasks, such as face recognition, age and gender prediction, have reached high accuracy appropriate for many practical applications [1,18]. However, but the ability to understand human emotions is still far from maturity [6]. The personal bias and backgrounds increase the uncertainty of emotion perception and contextual information [3].…”
Section: Introductionmentioning
confidence: 99%
“…Though FER has been a topic of major research [13], many models learn too many features specific for a concrete dataset, which is not practical for in-the-wild settings [6]. The development of in-the-wild affect prediction engines has been accelerated by a couple of ABAW (Affective Behavior Analysis in-the-wild) competitions [8,14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation