2016
DOI: 10.1007/978-3-319-49409-8_27
|View full text |Cite
|
Sign up to set email alerts
|

Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features

Abstract: We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio and novel stochastic visual features from few frames, without over-fitting. We empirically show that the trained mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 63 publications
(47 citation statements)
references
References 9 publications
0
46
0
Order By: Relevance
“…A similar work from [27] introduced a deep audio-visual residual network for multimodal personality trait recognition. Besides, [28] develop a volumetric convolution and Long-Short-Term-Memory (LSTM) based network to learn audiovisual temporal patterns. However, performances from all above-mentioned methods rely heavily on ensemble strategies and here we report better results with a single visual stream with PersEmoN.…”
Section: Deep Learning For Emotion Analysismentioning
confidence: 99%
“…A similar work from [27] introduced a deep audio-visual residual network for multimodal personality trait recognition. Besides, [28] develop a volumetric convolution and Long-Short-Term-Memory (LSTM) based network to learn audiovisual temporal patterns. However, performances from all above-mentioned methods rely heavily on ensemble strategies and here we report better results with a single visual stream with PersEmoN.…”
Section: Deep Learning For Emotion Analysismentioning
confidence: 99%
“…Regarding the recently proposed CNN based models for automatic personality perception [14], [15], [60], [62], we observed that there is still a long venue to be explored. The top three winner methods [14], [15], [62] submitted to the ChaLearn First Impression Challenge [9] obtained very similar overall performances (i.e., 0.913, 0.912 and 0.911, respectively) even though presenting different solutions, suggesting that proposed architectures may be exploiting complementary features [26], which could be combined somehow to improve overall accuracy. Moreover, deep neural networks are currently one of the most promising candidates to tackle the challenges of multimodal data fusion [14], [62], [65], [81] and multi-task solutions in first impressions.…”
Section: Discussionmentioning
confidence: 99%
“…At the training/test stage, the fully-connected layer outputs five continuous prediction values corresponding to each trait for the given input video clip. Their work won the third place in the ChaLearn First Impressions Challenge [9] (1 st round), whereas [62] and [14] achieved the second and first place, respectively. The work [15] was extended in [8] to consider verbal content, and to predict an "invitation to job interview" variable.…”
Section: Non-interactive Settingsmentioning
confidence: 99%
See 2 more Smart Citations