FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

Yang, Le; Ertuğrul, Itır Önal; Cohn, Jeffrey F.; Hammal, Zakia; Jiang, Dongmei; Sahli, Hichem

doi:10.1109/acii.2019.8925514

Cited by 27 publications

(15 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a recent study, Yang et al (2019) have proposed to model spatiotemporal information combining 2D-CNN with 3D-CNN for frame-level AU detection. However, whole video sequences are fed as input to 3D-CNN part to provide summary information about the entire video while modeling each frame.…”

Section: Using Dynamics For Au Detectionmentioning

confidence: 99%

D-PAttNet: Dynamic Patch-Attentive Deep Network for Action Unit Detection

Ertuğrul

Yang

Jeni

et al. 2019

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

Facial action units (AUs) relate to specific local facial regions. Recent efforts in automated AU detection have focused on learning the facial patch representations to detect specific AUs. These efforts have encountered three hurdles. First, they implicitly assume that facial patches are robust to head rotation; yet non-frontal rotation is common. Second, mappings between AUs and patches are defined a priori, which ignores co-occurrences among AUs. And third, the dynamics of AUs are either ignored or modeled sequentially rather than simultaneously as in human perception. Inspired by recent advances in human perception, we propose a dynamic patch-attentive deep network, called D-PAttNet, for AU detection that (i) controls for 3D head and face rotation, (ii) learns mappings of patches to AUs, and (iii) models spatiotemporal dynamics. D-PAttNet approach significantly improves upon existing state of the art.

show abstract

Section: Using Dynamics For Au Detectionmentioning

confidence: 99%

D-PAttNet: Dynamic Patch-Attentive Deep Network for Action Unit Detection

Ertuğrul

Yang

Jeni

et al. 2019

Front. Comput. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Most approaches, however, only focus on frame-based evaluation of facial actions, relying on analysing peakintensity frames [23,32]. As a result, even though these approaches are able to detect strong AU activations in posed settings or when an expression is highly accentuated, they suffer when detecting more subtle expressions in spontaneous and naturalistic settings [38,42], challenging their real-world applicability. A prevailing requirement for automatic AU detection is to be sensitive to the said AU lifecycle and include temporal information, such as motion features or correlations amongst proximal frames, along with spatial features [19,36,42].…”

Section: Introductionmentioning

confidence: 99%

“…As a result, even though these approaches are able to detect strong AU activations in posed settings or when an expression is highly accentuated, they suffer when detecting more subtle expressions in spontaneous and naturalistic settings [38,42], challenging their real-world applicability. A prevailing requirement for automatic AU detection is to be sensitive to the said AU lifecycle and include temporal information, such as motion features or correlations amongst proximal frames, along with spatial features [19,36,42]. While spatial processing is important to determine relationships between different facial regions [19], understanding temporal correlations between their activation patterns in contiguous frames provides essential information about the AU lifecycle and can be particularly useful in detecting subtle activations [4,36,42].…”

Section: Introductionmentioning

confidence: 99%

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Churamani¹,

Kalkan²,

Güneş³

2020

Preprint

View full text Add to dashboard Cite

Most state-of-the-art approaches for Facial Action Unit (AU) detection rely upon evaluating facial expressions from static frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are usually more subtle and evolve in a temporal manner requiring AU detection models to learn spatial as well as temporal information. In this paper, we focus on both spatial and spatio-temporal features encoding the temporal evolution of facial AU activation. For this purpose, we propose the Action Unit Lifecycle-Aware Capsule Network (AULA-Caps) that performs AU detection using both frame and sequence-level features. While at the frame-level the capsule layers of AULA-Caps learn spatial feature primitives to determine AU activations, at the sequence-level, it learns temporal dependencies between contiguous frames by focusing on relevant spatio-temporal segments in the sequence. The learnt feature capsules are routed together such that the model learns to selectively focus more on spatial or spatio-temporal information depending upon the AU lifecycle. The proposed model is evaluated on the commonly used BP4D and GFT benchmark datasets obtaining stateof-the-art results on both the datasets.

show abstract

“…Implementation Details: For each image frame, we perform a similarity transformation that includes rotation, uniform scaling, and translation to obtain a 3 × 192 × 192 color face. Because AU intensity is independent of facial color, to increase training efficiency, the normalized RGB images are converted to grayscale images [18]. The network requires the input image sequence to have the same number of image frames.…”

Section: Datasets and Settingsmentioning

confidence: 99%

Spatiotemporal Features and Local Relationship Learning for Facial Action Unit Intensity Regression

Wei

Lü

Gan

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

The action units (AU) encoded by the Facial Action Coding System (FACS) have been widely used in the representation of facial expressions. Although work on automatic facial AU detection has achieved quite good results in recent years, there remains much research potential for more accurate AU detection and intensity regression. Moreover, most work only considers the spatial information and ignores the temporal information. In practice, changes in facial AUs involve both spatial and temporal variation. In this paper, by extracting multi-scale spatial features and corresponding temporal features from the faces in the video image sequence, and learning the local relationship of the spatiotemporal features we propose a method that can obtain robust and accurate regression for AU intensity. The proposed method outperforms the baseline system on FEAFA dataset and obtains comparable performance on DISFA dataset.

show abstract

FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

Cited by 27 publications

References 23 publications

D-PAttNet: Dynamic Patch-Attentive Deep Network for Action Unit Detection

D-PAttNet: Dynamic Patch-Attentive Deep Network for Action Unit Detection

Spatio-Temporal Analysis of Facial Actions using Lifecycle-Aware Capsule Networks

Spatiotemporal Features and Local Relationship Learning for Facial Action Unit Intensity Regression

Contact Info

Product

Resources

About