Dynamic Facial Models for Video-Based Dimensional Affect Estimation

Song, Siyang; Sánchez-Lozano, Enrique; Tellamekala, Mani Kumar; Shen, Linlin; Johnston, Alan; Valstar, Michel

doi:10.1109/iccvw.2019.00200

Cited by 17 publications

(8 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In aiming to construct a video-level descriptor, the first task is to reduce the dimensionality. Current studies either extract hand-crafted features [38], [41], [61] or deep-learned features [18], [20], [42] to represent each frame or short video segment. Traditional hand-crafted features, e.g.…”

Section: Human Behaviour Primitives Extractionmentioning

confidence: 99%

Spectral Representation of Behaviour Primitives for Depression Analysis

Song

Jaiswal

Shen

et al. 2022

IEEE Trans. Affective Comput.

Self Cite

View full text Add to dashboard Cite

Depression is a serious mental disorder affecting millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and require extensive participation of clinicians. Recent advances in automatic depression analysis systems promise a future where these shortcomings are addressed by objective, repeatable, and readily available diagnostic tools to aid health professionals in their work. Yet there remain a number of barriers to the development of such tools. One barrier is that existing automatic depression analysis algorithms base their predictions on very brief sequential segments, sometimes as little as one frame. Another barrier is that existing methods do not take into account what the context of the measured behaviour is. In this paper, we extract multi-scale video-level features for video-based automatic depression analysis. We propose to use automatically detected human behaviour primitives as the low-dimensional descriptor for each frame. We also propose two novel spectral representations, i.e. spectral heatmaps and spectral vectors, to represent video-level multi-scale temporal dynamics of expressive behaviour. Constructed spectral representations are fed to Convolution Neural Networks (CNNs) and Artificial Neural Networks (ANNs) for depression analysis. We conducted experiments on the AVEC 2013 and AVEC 2014 benchmark datasets to investigate the influence of interview tasks on depression analysis. In addition to achieving state of the art accuracy in severity of depression estimation, we show that the task conducted by the user matters, that fusion of a combination of tasks reaches highest accuracy, and that longer tasks are more informative than shorter tasks, up to a point.

show abstract

Section: Human Behaviour Primitives Extractionmentioning

confidence: 99%

Spectral Representation of Behaviour Primitives for Depression Analysis

Song

Jaiswal

Shen

et al. 2022

IEEE Trans. Affective Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Subsequently, the videolevel prediction is made by these selected frames. Beyan et al [27] propose to generate multiple dynamic facial images [39], [40], [41] to represent each video segment and then choose a set of dynamic facial images that have the highest spatio-temporal saliency as the key frames to construct the video-level representation.…”

Section: Audio-visual Automatic Personality Analysismentioning

confidence: 99%

Learning Person-Specific Cognition From Facial Reactions for Automatic Personality Recognition

Song

Shao

Jaiswal³

et al. 2023

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

This paper proposes to recognise the true (self-reported) personality traits from the target subject's cognition simulated from facial reactions. This approach builds on the following two findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions, individuals' nonverbal behaviours are influenced by their conversational partner's behaviours. In this context, we hypothesise that during a dyadic interaction, a target subject's facial reactions are driven by two main factors: their internal (person-specific) cognitive process, and the externalised nonverbal behaviours of their conversational partner. Consequently, we propose to represent the target subject's (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject's facial reactions. Each person-specific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function, which is then represented as a graph representation for recognising the target subject's true personality. Experimental results not only show that the produced graph representations are well associated with target subjects' personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies help in learning more reliable personality representations.

show abstract

“…Subsequently, the video-level prediction is made by these selected frames. Beyan et al [7] propose to generate multiple dynamic facial images [9,82,83] to represent each video segment and then choose a set of dynamic facial images that have the highest spatio-temporal saliency as the key frames to construct the video-level representation.…”

Section: Audio-visual Automatic Personality Analysismentioning

confidence: 99%

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Song¹,

Shao²,

Jaiswal³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes to recognise the true (self-reported) personality from the learned simulation of the target subject's cognition. This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions individuals' nonverbal behaviours are influenced by their conversational partner's behaviours. In this context, we hypothesise that during a dyadic interaction, a target subject's facial reactions are driven by two main factors, i.e. their internal (person-specific) cognitive process, and the externalised nonverbal behaviours of their conversational partner. Consequently, we propose to represent the target subject's (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject's facial reactions. Each personspecific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function, which is then represented as a graph representation for recognising the target subject's true personality. Experimental results not only show that the produced graph representations are well associated with target subjects' personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies such as adaptive loss, and the end-to-end vertices/edges feature learning, help the proposed approach in learning more reliable personality representations. Building on our earlier version of this work, this paper further proposes: (i) assigning a unique depth for each CNN; (ii) a novel end-to-end graph vertex feature learning strategy; (iii) a transformer-based edge feature learning strategy; and (iv) evaluating the approach in human-machine interaction scenario.

show abstract

Dynamic Facial Models for Video-Based Dimensional Affect Estimation

Cited by 17 publications

References 44 publications

Spectral Representation of Behaviour Primitives for Depression Analysis

Spectral Representation of Behaviour Primitives for Depression Analysis

Learning Person-Specific Cognition From Facial Reactions for Automatic Personality Recognition

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Contact Info

Product

Resources

About