Two-Stream Aural-Visual Affect Analysis in the Wild

Kuhnke, Felix; Rumberg, Lars; Östermann, Jörn

doi:10.1109/fg47880.2020.00056

Cited by 45 publications

(32 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”

Section: Related Workmentioning

confidence: 99%

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Gref¹,

Matthiesen²,

Venugopala³

et al. 2022

Preprint

View full text Add to dashboard Cite

For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remembering. However, humans often perceive sentiments and emotions ambiguously and subjectively. Moreover, oral history interviews have multi-layered levels of complex, sometimes contradictory, sometimes very subtle facets of emotions. Therefore, the question arises of the chance machines and humans have capturing and assigning these into predefined categories. This paper investigates the ambiguity in human perception of emotions and sentiment in German oral history interviews and the impact on machine learning systems. Our experiments reveal substantial differences in human perception for different emotions. Furthermore, we report from ongoing machine learning experiments with different modalities. We show that the human perceptual ambiguity and other challenges, such as class imbalance and lack of training data, currently limit the opportunities of these technologies for oral history archives. Nonetheless, our work uncovers promising observations and possibilities for further research.

show abstract

Section: Related Workmentioning

confidence: 99%

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Gref¹,

Matthiesen²,

Venugopala³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Kuhnke and Rumberg [14] proposed a two-stream auralvisual model. Audio and image streams are first proposed separately and fed into a CNN network.…”

Section: Related Workmentioning

confidence: 99%

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Jin¹,

Zheng

Gao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Analyzing human affect is vital for human-computer interaction systems. Most methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem. In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames.We achieve an AU score of 0.712 and an expression score of 0.477 on the validation set. These results demonstrate the effectiveness of our approach in improving model performance.

show abstract

“…ABAW consists of three challenges on the same dataset, Aff-Wild2 [10]: dimensional affect recognition (in terms of valence and arousal), categorical affect classification (in terms of the seven basic emotions), and 12 facial action unit detection. Most of the top-ranked teams in ABAW1, which was held in conjunction with FG2020, proposed deep learning based multitask models that output the three challenges at once [1,14]. For the input data, the corresponding image is basically used, and additional (previous or post) images are used to further leverage temporal information [1,14,16].…”

Section: Related Work 21 Abawmentioning

confidence: 99%

“…Most of the top-ranked teams in the first challenge of ABAW (ABAW1) [6], held in conjunction with the 15 th IEEE Conference on Face and Gesture Recognition (FG2020), used convolutional neural networks (CNNs) with single facial images or sequences of such images. In cases where a single image was used, the captured image was inputted to be recognized, and even for teams that used image sequences, past or future images were used along with the image captured at that point [1,14,16]. Although these methods perform well with large-scale data in the wild, they encounter limitations when used in real time.…”

Section: Introductionmentioning

confidence: 99%

Causal affect prediction model using a facial image sequence

Oh,

Jeong,

Lim

2021

Preprint

View full text Add to dashboard Cite

Among human affective behavior research, facial expression recognition research is improving in performance along with the development of deep learning. However, for improved performance, not only past images but also future images should be used along with corresponding facial images, but there are obstacles to the application of this technique to real-time environments. In this paper, we propose the causal affect prediction network (CAPNet), which uses only past facial images to predict corresponding affective valence and arousal. We train CAPNet to learn causal inference between past images and corresponding affective valence and arousal through supervised learning by pairing the sequence of past images with the current label using the Aff-Wild2 dataset. We show through experiments that the well-trained CAPNet outperforms the baseline of the second challenge of the Affective Behavior Analysis in-the-wild (ABAW2) Competition by predicting affective valence and arousal only with past facial images one-third of a second earlier. Therefore, in real-time application, CAPNet can reliably predict affective valence and arousal only with past data.The code is publicly available. 1

show abstract

Two-Stream Aural-Visual Affect Analysis in the Wild

Cited by 45 publications

References 8 publications

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition

Causal affect prediction model using a facial image sequence

Contact Info

Product

Resources

About