2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII) 2019
DOI: 10.1109/acii.2019.8925514
|View full text |Cite
|
Sign up to set email alerts
|

FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
12
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 23 publications
3
12
0
Order By: Relevance
“…In a recent study, Yang et al (2019) have proposed to model spatiotemporal information combining 2D-CNN with 3D-CNN for frame-level AU detection. However, whole video sequences are fed as input to 3D-CNN part to provide summary information about the entire video while modeling each frame.…”
Section: Using Dynamics For Au Detectionmentioning
confidence: 99%
“…In a recent study, Yang et al (2019) have proposed to model spatiotemporal information combining 2D-CNN with 3D-CNN for frame-level AU detection. However, whole video sequences are fed as input to 3D-CNN part to provide summary information about the entire video while modeling each frame.…”
Section: Using Dynamics For Au Detectionmentioning
confidence: 99%
“…Most approaches, however, only focus on frame-based evaluation of facial actions, relying on analysing peakintensity frames [23,32]. As a result, even though these approaches are able to detect strong AU activations in posed settings or when an expression is highly accentuated, they suffer when detecting more subtle expressions in spontaneous and naturalistic settings [38,42], challenging their real-world applicability. A prevailing requirement for automatic AU detection is to be sensitive to the said AU lifecycle and include temporal information, such as motion features or correlations amongst proximal frames, along with spatial features [19,36,42].…”
Section: Introductionmentioning
confidence: 99%
“…As a result, even though these approaches are able to detect strong AU activations in posed settings or when an expression is highly accentuated, they suffer when detecting more subtle expressions in spontaneous and naturalistic settings [38,42], challenging their real-world applicability. A prevailing requirement for automatic AU detection is to be sensitive to the said AU lifecycle and include temporal information, such as motion features or correlations amongst proximal frames, along with spatial features [19,36,42]. While spatial processing is important to determine relationships between different facial regions [19], understanding temporal correlations between their activation patterns in contiguous frames provides essential information about the AU lifecycle and can be particularly useful in detecting subtle activations [4,36,42].…”
Section: Introductionmentioning
confidence: 99%
“…Implementation Details: For each image frame, we perform a similarity transformation that includes rotation, uniform scaling, and translation to obtain a 3 × 192 × 192 color face. Because AU intensity is independent of facial color, to increase training efficiency, the normalized RGB images are converted to grayscale images [18]. The network requires the input image sequence to have the same number of image frames.…”
Section: Datasets and Settingsmentioning
confidence: 99%