2019
DOI: 10.1016/j.imavis.2018.10.002
|View full text |Cite
|
Sign up to set email alerts
|

Learning facial action units with spatiotemporal cues and multi-label sampling

Abstract: Facial action units (AUs) may be represented spatially, temporally, and in terms of their correlation. Previous research focuses on one or another of these aspects or addresses them disjointly. We propose a hybrid network architecture that jointly models spatial and temporal representations and their correlation. In particular, we use a Convolutional Neural Network (CNN) to learn spatial representations, and a Long Short-Term Memory (LSTM) to model temporal dependencies among them. The outputs of CNNs and LSTM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…Early FEA work often included a computationally intensive and laborious process (e.g., face and facial landmark detection, hand-crafted feature extraction, and limited classification methods). Nowadays, researchers benefit from having access to comprehensive, large-scale facial datasets, as well as advanced computing resources to develop more efficient facial analysis methods [68,84,85,110,118,120].…”
Section: Jointly Estimating Landmark Detection and Action Unit Intensitymentioning
confidence: 99%
“…Early FEA work often included a computationally intensive and laborious process (e.g., face and facial landmark detection, hand-crafted feature extraction, and limited classification methods). Nowadays, researchers benefit from having access to comprehensive, large-scale facial datasets, as well as advanced computing resources to develop more efficient facial analysis methods [68,84,85,110,118,120].…”
Section: Jointly Estimating Landmark Detection and Action Unit Intensitymentioning
confidence: 99%
“…Recently, multi-label stratified sampling was found advantageous over naive sampling strategies for AU detection (Chu et al, 2019). In this experiment, we employed this strategy and investigated the effect of different training set sizes on the performance.…”
Section: Training Set Sizementioning
confidence: 99%
“…Li et al [11] predicted AUs through multi-label learning and optimal temporal fusion using a long short-term memory (LSTM) network. Chu et al [12] proposed a structure comprising a 2D CNN and a LSTM network, where the 2D CNN learns spatial representations, and the LSTM models learn their temporal dependencies.…”
Section: Related Workmentioning
confidence: 99%
“…In this way, l/32 × l/32 local features can be obtained, and through local relationship learning with a bidirectional LSTM (BLSTM) structure, we can obtain a feature vector, which is fed into fully connected layers to regress the AU intensity. Considering methods like [11] and [12], and the results of pre-experiments, the depth of the BLSTM network is set to 2.…”
Section: Local Relationship Learningmentioning
confidence: 99%