2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) 2020
DOI: 10.1109/fg47880.2020.00056
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stream Aural-Visual Affect Analysis in the Wild

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
29
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 45 publications
(32 citation statements)
references
References 8 publications
0
29
0
Order By: Relevance
“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”
Section: Related Workmentioning
confidence: 99%
“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”
Section: Related Workmentioning
confidence: 99%
“…Kuhnke and Rumberg [14] proposed a two-stream auralvisual model. Audio and image streams are first proposed separately and fed into a CNN network.…”
Section: Related Workmentioning
confidence: 99%
“…ABAW consists of three challenges on the same dataset, Aff-Wild2 [10]: dimensional affect recognition (in terms of valence and arousal), categorical affect classification (in terms of the seven basic emotions), and 12 facial action unit detection. Most of the top-ranked teams in ABAW1, which was held in conjunction with FG2020, proposed deep learning based multitask models that output the three challenges at once [1,14]. For the input data, the corresponding image is basically used, and additional (previous or post) images are used to further leverage temporal information [1,14,16].…”
Section: Related Work 21 Abawmentioning
confidence: 99%
“…Most of the top-ranked teams in the first challenge of ABAW (ABAW1) [6], held in conjunction with the 15 th IEEE Conference on Face and Gesture Recognition (FG2020), used convolutional neural networks (CNNs) with single facial images or sequences of such images. In cases where a single image was used, the captured image was inputted to be recognized, and even for teams that used image sequences, past or future images were used along with the image captured at that point [1,14,16]. Although these methods perform well with large-scale data in the wild, they encounter limitations when used in real time.…”
Section: Introductionmentioning
confidence: 99%