2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.02058
|View full text |Cite
|
Sign up to set email alerts
|

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Abstract: Figure 1: Examples from the contrastively collected dataset. On the left side of each example is the query painting with its most common emotion on top of it. The right side shows a similar painting, based on the VGG feature map, which evokes the opposite emotion. We show the old utterance of the selected image and the new utterance to highlight the increased attention to details. Despite of paired paintings having very similar styles, the triggered emotions and utterances are very different.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 28 publications
0
10
0
Order By: Relevance
“…We evaluate our proposed metric, ImageCaptioner 2 , on two datasets, i.e., MS-COCO captioning dataset (Lin et al 2014) for gender and race attributes, Artemis V1 (Achlioptas et al 2021), and V2 (Mohamed et al 2022) for emotions attribute. For gender and race, we validate the effectiveness of our metric on a wide range of image captioning models, i.e., NIC (Vinyals et al 2015), SAT (Xu et al 2015), FC (Rennie et al 2017), Att2in (Rennie et al 2017), UpDn (Anderson et al 2018), Transformer (Vaswani et al 2017, OSCAR (Li et al 2020), NIC+ (Hendricks et al 2018), and NIC+Eq (Hendricks et al 2018) 8 (9) 7.3 (9) 5.93 (5) 3.08 (9) Table 4: The bias amplification results for the gender attribute on MS-COCO datasets for LIC and our metric using different judge models across different image captioning models.…”
Section: Experiments Datasets and Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…We evaluate our proposed metric, ImageCaptioner 2 , on two datasets, i.e., MS-COCO captioning dataset (Lin et al 2014) for gender and race attributes, Artemis V1 (Achlioptas et al 2021), and V2 (Mohamed et al 2022) for emotions attribute. For gender and race, we validate the effectiveness of our metric on a wide range of image captioning models, i.e., NIC (Vinyals et al 2015), SAT (Xu et al 2015), FC (Rennie et al 2017), Att2in (Rennie et al 2017), UpDn (Anderson et al 2018), Transformer (Vaswani et al 2017, OSCAR (Li et al 2020), NIC+ (Hendricks et al 2018), and NIC+Eq (Hendricks et al 2018) 8 (9) 7.3 (9) 5.93 (5) 3.08 (9) Table 4: The bias amplification results for the gender attribute on MS-COCO datasets for LIC and our metric using different judge models across different image captioning models.…”
Section: Experiments Datasets and Modelsmentioning
confidence: 99%
“…The ranking of captioning models is reported in red, which indicates to what extend the metric is consistent when changing the judging model. et al 2021) and V2 (Mohamed et al 2022), we explor SAT (Xu et al 2015), and Emotion-Grounded SAT (EG-SAT) (Achlioptas et al 2021) with its variants. The EG-SAT is an adapted version of SAT that incorporates the emotional signal into the speaker to generate controlled text.…”
Section: Experiments Datasets and Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Unlike the previous method, we introduce an AVE to integrate emotion attributes into visual information, which enables our model to consider the given emotions more deeply and produce emotion-conditioned sentences. [25] recently published ArtEmis v2.0, which is an extension of ArtEmis [26]. It proposes a contrastive data collection approach to balance ArtEmis with a new complementary dataset in which a pair of similar artworks have contrasting emotions.…”
Section: Related Workmentioning
confidence: 99%
“…We only include text-based SM tasks. Another improvement direction is to extend this benchmark to more tasks that involve more modalities, such affective image captioning (Mohamed et al, 2022) and multi-modal emotion recognition (Firdaus et al, 2020).…”
Section: Limitationsmentioning
confidence: 99%