Self-Supervised Learning of Person-Specific Facial Dynamics for Automatic Personality Recognition

Song, Siyang; Jaiswal, Shashank; Sánchez, Ernesto Sánchez; Tzimiropoulos, Georgios; Shen, Linlin; Valstar, Michel

doi:10.1109/taffc.2021.3064601

Cited by 29 publications

(27 citation statements)

References 90 publications

(130 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Novelty: The main novelties of the proposed approach are summarised as follows: firstly. we propose to use the simulated human cognition as the source descriptor to recognise true personality traits, which differs from existing approaches [7,16,33,52,70,80,94,98] that predict apparent personality traits directly from target subjects' expressive behaviours. Secondly, we propose the first non-invasive approach that simulates human person-specific cognitive processes that relate to facial reactions.…”

Section: Methodsmentioning

confidence: 99%

“…In summary, while modelling personality traits at the frame/segment-level is problematic, the recent clip-level representations usually failed to utilise the full scale of the available information in the data, as they select a subset or key frames to represent an entire video. To avoid these problems, Song et al [80] propose a domain adaption approach to learn a set of intermediate convolution layers from all available data as the person-specific representation for the target subject, which achieved a comparable performance to the state-of-the-art method [52]. However, similar to the approaches described above, it still directly infers apparent personality based on the subjects' observable behaviours.…”

Section: Audio-visual Automatic Personality Analysismentioning

confidence: 99%

“…Recent advances in machine learning (ML) have enabled the development of non-invasive automatic personality traits analysers that recognise subjects' personality traits from their audiovisual non-verbal behaviours [16,28,52,80,90,98] as there is solid psychological and biological evidence [19,27,48,95] claiming that nonverbal behaviours are reliable predictors of personality. In most of these approaches, ML models are trained with the personality labels provided by the external observers (annotators), and they therefore output their perception of the target subjects' personality.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Song¹,

Shao²,

Jaiswal³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes to recognise the true (self-reported) personality from the learned simulation of the target subject's cognition. This approach builds on two following findings in cognitive science: (i) human cognition partially determines expressed behaviour and is directly linked to true personality traits; and (ii) in dyadic interactions individuals' nonverbal behaviours are influenced by their conversational partner's behaviours. In this context, we hypothesise that during a dyadic interaction, a target subject's facial reactions are driven by two main factors, i.e. their internal (person-specific) cognitive process, and the externalised nonverbal behaviours of their conversational partner. Consequently, we propose to represent the target subject's (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject's facial reactions. Each personspecific CNN is explored by the Neural Architecture Search (NAS) and a novel adaptive loss function, which is then represented as a graph representation for recognising the target subject's true personality. Experimental results not only show that the produced graph representations are well associated with target subjects' personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies such as adaptive loss, and the end-to-end vertices/edges feature learning, help the proposed approach in learning more reliable personality representations. Building on our earlier version of this work, this paper further proposes: (i) assigning a unique depth for each CNN; (ii) a novel end-to-end graph vertex feature learning strategy; (iii) a transformer-based edge feature learning strategy; and (iv) evaluating the approach in human-machine interaction scenario.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Audio-visual Automatic Personality Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Song¹,

Shao²,

Jaiswal³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Each ERN used in our experiments is made up of two Methods Ope Con Ext Agr Neu Avg. ACC Spectral [45] 0.752 0.807 0.849 0.800 0.788 0.799 DCC [18] 0.755 0.787 0.772 0.736 0.791 0.768 NJU-LAMDA [51] 0.741 0.826 0.827 0.753 0.789 0.787 CR-Net [29] 0.830 0.876 0.904 0.887 0.903 0.880 PALs [44] 0.845 0.819 0.916 0.837 0.911 0.866 Ours (A-MModal (S)) 0.833 0.890 0.913 0.869 0.917 0.884 Ours (MModal (M)) 0.889 0.925 0.923 0.913 0.921 0.914 Ours (A-MModal (M)) 0.882 0.925 0.931 0.912 0.925 0.915 PCC Spectral [45] -0.010 0.059 0.135 0.071 0.024 0.056 DCC [18] -0.153 -0.078 0.037 -0.024 0.121 0.008 NJU-LAMDA [51] MModal denotes the graph representations of multi-modal processors. (M) and (S) represent the multi-level and singlelevel fusion, respectively.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…The problem with these approaches is that at the level of a single frame or short segment, even people with different personality traits may display very similar non-verbal audio-visual behaviours. Therefore, these training strategies would end up utilising the same input pattern with multiple labels, making it practically impossible to train a model that has a good generalization capability [44,45,47] (Problem 2). Although some approaches select a set of key frames to represent an entire video and infer personality from such video-level representations [4,29,53] , they ignore the details contained in the discarded frames (Problem 3).…”

Section: Introductionmentioning

confidence: 99%

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Shao

Song

Jaiswal

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Recent research shows that in dyadic and group interactions individuals' nonverbal behaviours are influenced by the behaviours of their conversational partner(s). Therefore, in this work we hypothesise that during a dyadic interaction, the target subject's facial reactions are driven by two main factors: (i) their internal (personspecific) cognition, and (ii) the externalised nonverbal behaviours of their conversational partner. Subsequently, our novel proposition is to simulate and represent the target subject's (i.e., the listener) cognitive process in the form of a person-specific CNN architecture whose input is the audio-visual non-verbal cues displayed by the conversational partner (i.e., the speaker), and the output is the target subject's (i.e., the listener) facial reactions. We then undertake a search for the optimal CNN architecture whose results are used to create a person-specific graph representation for recognising the target subject's personality. The graph representation, fortified with a novel end-to-end edge feature learning strategy, helps with retaining both the unique parameters of the person-specific CNN and the geometrical relationship between its layers. Consequently, the proposed approach is the first work that aims to recognize the true (self-reported) personality of a target subject (i.e., the listener) from the learned simulation of their cognitive process (i.e., parameters of the person-specific CNN). The experimental results show that the CNN architectures are well associated with target subjects' personality traits and the proposed approach clearly outperforms multiple existing approaches that predict personality directly from non-verbal behaviours. In light of these findings, this work opens up a new avenue of research for predicting and recognizing socioemotional phenomena (personality, affect, engagement etc.) from simulations of person-specific cognitive processes.

show abstract

Adaptive information fusion network for multi‐modal personality recognition

Bao,

Liu,

et al. 2024

Computer Animation & Virtual

View full text Add to dashboard Cite

Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi‐modal information fusion network (AMIF‐Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF‐Net encoder, we process the extracted audio and video features separately, effectively capturing long‐term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio‐video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi‐modal dataset show partial performance surpassing state‐of‐the‐art networks.

show abstract

Self-Supervised Learning of Person-Specific Facial Dynamics for Automatic Personality Recognition

Cited by 29 publications

References 90 publications

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Learning Graph Representation of Person-specific Cognitive Processes from Audio-visual Behaviours for Automatic Personality Recognition

Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation

Adaptive information fusion network for multi‐modal personality recognition

Contact Info

Product

Resources

About