Affect Expression Behaviour Analysis in the Wild using Spatio-Channel Attention and Complementary Context Information

Gera, Darshan; Balasubramanian, Siva

doi:10.48550/arxiv.2009.14440

Cited by 8 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”

Section: Related Workmentioning

confidence: 99%

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Gref¹,

Matthiesen²,

Venugopala³

et al. 2022

Preprint

View full text Add to dashboard Cite

For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remembering. However, humans often perceive sentiments and emotions ambiguously and subjectively. Moreover, oral history interviews have multi-layered levels of complex, sometimes contradictory, sometimes very subtle facets of emotions. Therefore, the question arises of the chance machines and humans have capturing and assigning these into predefined categories. This paper investigates the ambiguity in human perception of emotions and sentiment in German oral history interviews and the impact on machine learning systems. Our experiments reveal substantial differences in human perception for different emotions. Furthermore, we report from ongoing machine learning experiments with different modalities. We show that the human perceptual ambiguity and other challenges, such as class imbalance and lack of training data, currently limit the opportunities of these technologies for oral history archives. Nonetheless, our work uncovers promising observations and possibilities for further research.

show abstract

Section: Related Workmentioning

confidence: 99%

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Gref¹,

Matthiesen²,

Venugopala³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Considering the problems of unbalanced data and missing label, Deng et al [7] propose a structure of Teacher-Student to learn from the unlabelled data by way of soft label. Besides the multi-task frameworks, Gera et al [8] focus on the task of discrete emotion classification and propose the network based on attention mechanism. Zhang et al [9] propose a multi-model approach M 3 T for valence-arousal estimation using the visual feature extracted from 3D convolution network and a bidirectional recurrent neural network and the audio features extracted from a acoustic sub-network.…”

Section: Automatic Affective Behavior Analysismentioning

confidence: 99%

“…Different from most existed facial emotion datasets [3,4,5,6] that contain only one of the three common used emotional representations: Categorical Emotions (CE), Action Units (AU), and Valence Arousal (VA), the Aff-Wild2 [2] dataset is annotated with all three kinds of emotional labels, containing extended facial behaviors in random conditions and increased subjects/frames to the former Aff-Wild [1] dataset. Consequently, the multi-task affective recognition can benefit from it, for example, the works [7,8,9,10] participated in the first Affective Behavior Analysis in-the-wild (ABAW) Competition [11].…”

Section: Introductionmentioning

confidence: 99%

Prior Aided Streaming Network for Multi-task Affective Analysis

Zhang¹,

Guo²,

Chen³

et al. 2021

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

Automatic affective recognition has been an important research topic in human computer interaction (HCI) area. With recent development of deep learning techniques and large scale in-the-wild annotated datasets, the facial emotion analysis is now aimed at challenges in the real world settings. In this paper, we introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW2) Competition. In dealing with different emotion representations, including Categorical Emotions (CE), Action Units (AU), and Valence Arousal (VA), we propose a multi-task streaming network by a heuristic that the three representations are intrinsically associated with each other. Besides, we leverage an advanced facial expression embedding as prior knowledge, which is capable of capturing identity-invariant expression features while preserving the expression similarities, to aid the down-streaming recognition tasks. The extensive quantitative evaluations as well as ablation studies on the Aff-Wild2 dataset prove the effectiveness of our proposed prior aided streaming network approach.

show abstract

“…A. W. Yip et al [14] compared the accuracy of face recognition between color images and gray-scale images, and found that there was almost no difference in accuracy at a certain high resolution.It also shows that if a pseudo-color image with adjusted color tones is refined from a gray-scale image, the accuracy will be equal to or higher than that of a color im- age even at low resolution. In emotion estimation, it has been shown that the estimation accuracy is improved by extracting facial features using RESNET pre-trained with the VggFace2 dataset [15] [16]. It is also suggested that the accuracy of emotion estimation can be improved by learning with multi-modal information including audio as well as video [16] [17].…”

Section: Related Workmentioning

confidence: 99%

Multi-modal Affect Analysis using standardized data within subjects in the Wild

Youoku,

Yamamoto,

Saito

et al. 2021

Preprint

View full text Add to dashboard Cite

Human affective recognition is an important factor in human-computer interaction. However, the method development with in-the-wild data is not yet accurate enough for practical usage. In this paper, we introduce the affective recognition method focusing on facial expression (EXP) and valence-arousal calculation that was submitted to the Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest. When annotating facial expressions from a video, we thought that it would be judged not only from the features common to all people, but also from the relative changes in the time series of individuals. Therefore, after learning the common features for each frame, we constructed a facial expression estimation model and valence-arousal model using time-series data after combining the common features and the standardized features for each video. Furthermore, the above features were learned using multi-modal data such as image features, AU, Head pose, and Gaze. In the validation set, our model achieved a facial expression score of 0.546. These verification results reveal that our proposed framework can improve estimation accuracy and robustness effectively.

show abstract

Affect Expression Behaviour Analysis in the Wild using Spatio-Channel Attention and Complementary Context Information

Cited by 8 publications

References 17 publications

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Prior Aided Streaming Network for Multi-task Affective Analysis

Multi-modal Affect Analysis using standardized data within subjects in the Wild

Contact Info

Product

Resources

About