Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features

Subramaniam, Arulkumar; Patel, Vismay; Mishra, Ashish; Balasubramanian, P.; Mittal, Anurag

doi:10.1007/978-3-319-49409-8_27

Cited by 63 publications

(47 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar work from [27] introduced a deep audio-visual residual network for multimodal personality trait recognition. Besides, [28] develop a volumetric convolution and Long-Short-Term-Memory (LSTM) based network to learn audiovisual temporal patterns. However, performances from all above-mentioned methods rely heavily on ensemble strategies and here we report better results with a single visual stream with PersEmoN.…”

Section: Deep Learning For Emotion Analysismentioning

confidence: 99%

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

Zhang

Peng

Winkler

2022

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Personality and emotion are both central to affective computing. Existing works solve them individually. In this paper we investigate if such high-level affect traits and their relationship can be jointly learned from face images in the wild. To this end, we introduce PersEmoN, an end-to-end trainable and deep Siamese-like network which we call emotion network and personality network, respectively. It consists of two convolutional network branches, one for emotion and the other for apparent personality. Both networks share their bottom feature extraction module and are optimized within a multi-task learning framework. Emotion and personality networks are dedicated to their own annotated dataset. An adversarial-like loss function is further employed to promote representation coherence among heterogeneous dataset sources. Based on this, the emotion-to-personality relationship is also well explored. Extensive experiments are provided to demonstrate the effectiveness of PersEmoN.

show abstract

Section: Deep Learning For Emotion Analysismentioning

confidence: 99%

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

Zhang

Peng

Winkler

2022

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

show abstract

“…Regarding the recently proposed CNN based models for automatic personality perception [14], [15], [60], [62], we observed that there is still a long venue to be explored. The top three winner methods [14], [15], [62] submitted to the ChaLearn First Impression Challenge [9] obtained very similar overall performances (i.e., 0.913, 0.912 and 0.911, respectively) even though presenting different solutions, suggesting that proposed architectures may be exploiting complementary features [26], which could be combined somehow to improve overall accuracy. Moreover, deep neural networks are currently one of the most promising candidates to tackle the challenges of multimodal data fusion [14], [62], [65], [81] and multi-task solutions in first impressions.…”

Section: Discussionmentioning

confidence: 99%

“…At the training/test stage, the fully-connected layer outputs five continuous prediction values corresponding to each trait for the given input video clip. Their work won the third place in the ChaLearn First Impressions Challenge [9] (1 st round), whereas [62] and [14] achieved the second and first place, respectively. The work [15] was extended in [8] to consider verbal content, and to predict an "invitation to job interview" variable.…”

Section: Non-interactive Settingsmentioning

confidence: 99%

“…The winning methods were based on deep learning [14], [65]. In fact, most participants of the contest adopted deep learning methods (e.g., [15], [62]). The best performance was achieved by solutions that incorporated both audio and visual cues.…”

Section: Trait Recognition Challengesmentioning

confidence: 99%

“…Big-Five impressions [14], [15], [19], [62], [65], [81], [ [40], [57], [58], [59], [63], [114], [115] Emergent LEAder (ELEA) [70], Audiovisual 2012 40 meetings:~15min each, 27 having both audio and video, composed of 3 or 4 members, 148 participants; 6 static (25fps) and 2 portable (30fps) cameras, controlled environment…”

Section: Apparent Personality Trait and Hirability Impressionsmentioning

confidence: 99%

See 2 more Smart Citations

First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

Jacques

Güçlütürk

Pérez

et al. 2022

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.

show abstract

Recognition of Urban Sound Events Using Deep Context-Aware Feature Extractors and Handcrafted Features

Γιαννακόπουλος

Spyrou

Perantonis

2019

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

This paper proposes a method for recognizing audio events in urban environments that combines handcrafted audio features with a deep learning architectural scheme (Convolutional Neural Networks, CNNs), which has been trained to distinguish between different audio context classes. The core idea is to use the CNNs as a method to extract context-aware deep audio features that can offer supplementary feature representations to any soundscape analysis classification task. Towards this end, the CNN is trained on a database of audio samples which are annotated in terms of their respective "scene" (e.g. train, street, park), and then it is combined with handcrafted audio features in an early fusion approach, in order to recognize the audio event of an unknown audio recording. Detailed experimentation proves that the proposed contextaware deep learning scheme, when combined with the typical handcrafted features, leads to a significant performance boosting in terms of classification accuracy. The main contribution of this work is the demonstration that transferring audio contextual knowledge using CNNs as feature extractors can significantly improve the performance of the audio classifier, without need for CNN training (a rather demanding process that requires huge datasets and complex data augmentation procedures).

show abstract

Bi-modal First Impressions Recognition Using Temporally Ordered Deep Audio and Stochastic Visual Features

Cited by 63 publications

References 9 publications

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

Recognition of Urban Sound Events Using Deep Context-Aware Feature Extractors and Handcrafted Features

Contact Info

Product

Resources

About