It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Mohamed, Youssef; Khan, Faizan Farooq; Haydarov, Kilichbek; Elhoseiny, Mohamed

doi:10.1109/cvpr52688.2022.02058

Cited by 9 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We evaluate our proposed metric, ImageCaptioner 2 , on two datasets, i.e., MS-COCO captioning dataset (Lin et al 2014) for gender and race attributes, Artemis V1 (Achlioptas et al 2021), and V2 (Mohamed et al 2022) for emotions attribute. For gender and race, we validate the effectiveness of our metric on a wide range of image captioning models, i.e., NIC (Vinyals et al 2015), SAT (Xu et al 2015), FC (Rennie et al 2017), Att2in (Rennie et al 2017), UpDn (Anderson et al 2018), Transformer (Vaswani et al 2017, OSCAR (Li et al 2020), NIC+ (Hendricks et al 2018), and NIC+Eq (Hendricks et al 2018) 8 (9) 7.3 (9) 5.93 (5) 3.08 (9) Table 4: The bias amplification results for the gender attribute on MS-COCO datasets for LIC and our metric using different judge models across different image captioning models.…”

Section: Experiments Datasets and Modelsmentioning

confidence: 99%

“…The ranking of captioning models is reported in red, which indicates to what extend the metric is consistent when changing the judging model. et al 2021) and V2 (Mohamed et al 2022), we explor SAT (Xu et al 2015), and Emotion-Grounded SAT (EG-SAT) (Achlioptas et al 2021) with its variants. The EG-SAT is an adapted version of SAT that incorporates the emotional signal into the speaker to generate controlled text.…”

Section: Experiments Datasets and Modelsmentioning

confidence: 99%

“…Experimentally, we evaluate our metric through comprehensive experiments across 11 different image captioning techniques on three different datasets, i.e., MS-COCO caption dataset (Lin et al 2014), ArtEmis V1 (Achlioptas et al 2021), and ArtEmis V2 (Mohamed et al 2022), and on three different protected attributes, i.e., gender, race, and emotions. In addition, we introduce consistency measures to judge which learnable metric is more consistent against classifiers variations: 1) Introducing conflict-score to count the number of mismatches between different classifiers.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment

Abdelrahman,

Sun,

et al. 2024

AAAI

View full text Add to dashboard Cite

Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image captioning. Despite the significant effort in this direction, we observed that existing metrics lack consistency in the inclusion of the visual signal. In this paper, we introduce a new bias assessment metric, dubbed ImageCaptioner2, for image captioning. Instead of measuring the absolute bias in the model or the data, ImageCaptioner2pay more attention to the bias introduced by the model w.r.t the data bias, termed bias amplification. Unlike the existing methods, which only evaluate the image captioning algorithms based on the generated captions only, ImageCaptioner2incorporates the image while measuring the bias. In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers. Finally, we apply our ImageCaptioner2metric across 11 different image captioning architectures on three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and Artemis V2, and on three different protected attributes, i.e., gender, race, and emotions. Consequently, we verify the effectiveness of our ImageCaptioner2metric by proposing Anonymous-Bench, which is a novel human evaluation paradigm for bias metrics. Our metric shows significant superiority over the recent bias metric; LIC, in terms of human alignment, where the correlation scores are 80% and 54% for our metric and LIC, respectively. The code and more details are available at https://eslambakr.github.io/imagecaptioner2.github.io/.

show abstract

Section: Experiments Datasets and Modelsmentioning

confidence: 99%

Section: Experiments Datasets and Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment

Abdelrahman,

Sun,

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Unlike the previous method, we introduce an AVE to integrate emotion attributes into visual information, which enables our model to consider the given emotions more deeply and produce emotion-conditioned sentences. [25] recently published ArtEmis v2.0, which is an extension of ArtEmis [26]. It proposes a contrastive data collection approach to balance ArtEmis with a new complementary dataset in which a pair of similar artworks have contrasting emotions.…”

Section: Related Workmentioning

confidence: 99%

Affective Image Captioning for Visual Artworks Using Emotion-Based Cross-Attention Mechanisms

Ishikawa

Sugiura

2023

IEEE Access

View full text Add to dashboard Cite

Within the museum community, the automatic generation of artwork description is expected to accelerate the improvement of accessibility for visually impaired visitors. Captions that describe artworks should be based on emotions because art is inseparable from viewers' emotional reactions. By contrast, artworks typically do not have unique interpretations; thus, it is difficult for systems to reflect the specified emotions in captions precisely. Most existing methods attempt to leverage predicted emotion labels from images to generate emotion-oriented captions; however, they do not allow users to specify arbitrary emotions. We propose an affective visual encoder, which integrates emotion attributes and cross-modal joint features of images into visual information over all encoder blocks. Moreover, we introduce affective tokens that fuse grid-and region-based image features to cover both contextual and object-level information. We validated our method on the ArtEmis dataset, and the results demonstrated that our method outperformed baseline methods on all metrics in the emotion-conditioned task.

show abstract

“…We only include text-based SM tasks. Another improvement direction is to extend this benchmark to more tasks that involve more modalities, such affective image captioning (Mohamed et al, 2022) and multi-modal emotion recognition (Firdaus et al, 2020).…”

Section: Limitationsmentioning

confidence: 99%

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

Zhang,

Doan,

Liao

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate remarkable performance in a wide range of tasks. Despite numerous recent studies that examine the performance of instruction-tuned LLMs on various NLP benchmarks, there remains a lack of comprehensive investigation into their ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning embedded within social and interactive contexts. This deficiency arises partly from SM not being adequately represented in any of the existing benchmarks. To address this gap, we present SPARROW, an extensive multilingual benchmark specifically designed for SM understanding. SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition). SPAR-ROW datasets encompass 64 different languages originating from 12 language families representing 16 writing scripts. We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our comprehensive analysis reveals that existing opensource instruction tuned LLMs still struggle to understand SM across various languages, performing close to a random baseline in some cases. We also find that although Chat-GPT outperforms many LLMs, it still falls behind task-specific finetuned models with a gap of 12.

show abstract

It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection

Cited by 9 publications

References 28 publications

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment

ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment

Affective Image Captioning for Visual Artworks Using Emotion-Based Cross-Attention Mechanisms

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

Contact Info

Product

Resources

About