2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019
DOI: 10.1109/iccvw.2019.00200
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Facial Models for Video-Based Dimensional Affect Estimation

Abstract: Dimensional affect estimation from a face video is a challenging task, mainly due to the large number of possible facial displays made up of a set of behaviour primitives including facial muscle actions. The displays vary not only in composition but also in temporal evolution, with each display composed of behaviour primitives with varying in their short and long-term characteristics. Most existing work models affect relies on complex hierarchical recurrent models unable to capture short-term dynamics well. In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

4
5

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 44 publications
0
8
0
Order By: Relevance
“…In aiming to construct a video-level descriptor, the first task is to reduce the dimensionality. Current studies either extract hand-crafted features [38], [41], [61] or deep-learned features [18], [20], [42] to represent each frame or short video segment. Traditional hand-crafted features, e.g.…”
Section: Human Behaviour Primitives Extractionmentioning
confidence: 99%
“…In aiming to construct a video-level descriptor, the first task is to reduce the dimensionality. Current studies either extract hand-crafted features [38], [41], [61] or deep-learned features [18], [20], [42] to represent each frame or short video segment. Traditional hand-crafted features, e.g.…”
Section: Human Behaviour Primitives Extractionmentioning
confidence: 99%
“…Subsequently, the videolevel prediction is made by these selected frames. Beyan et al [27] propose to generate multiple dynamic facial images [39], [40], [41] to represent each video segment and then choose a set of dynamic facial images that have the highest spatio-temporal saliency as the key frames to construct the video-level representation.…”
Section: Audio-visual Automatic Personality Analysismentioning
confidence: 99%
“…Subsequently, the video-level prediction is made by these selected frames. Beyan et al [7] propose to generate multiple dynamic facial images [9,82,83] to represent each video segment and then choose a set of dynamic facial images that have the highest spatio-temporal saliency as the key frames to construct the video-level representation.…”
Section: Audio-visual Automatic Personality Analysismentioning
confidence: 99%