On Explaining Multimodal Hateful Meme Detection Models

Hee, Ming Shan; Lee, Roy Ka-Wei; Chong, Wen-Haw

doi:10.1145/3485447.3512260

Cited by 17 publications

(7 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Detection of hateful memes is crucial due to their potential misuse for spreading harmful messages [12,20], misinformation 2 https://github.com/Social-AI-Studio/MemeCraft [24,29], and propaganda [5]. Efforts to develop models for detecting harmful memes have intensified in academia and industry [1,2,8,9,13,16,30,38,52].…”

Section: Related Work 21 Meme Anaylsis and Generationmentioning

confidence: 99%

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Wang,

Lee

2024

Proceedings of the ACM Web Conference 2024

Self Cite

View full text Add to dashboard Cite

Online memes have emerged as powerful digital cultural artifacts in the age of social media, offering not only humor but also platforms for political discourse, social critique, and information dissemination. Their extensive reach and influence in shaping online communities' sentiments make them invaluable tools for campaigning and promoting ideologies. Despite the development of several memegeneration tools, there remains a gap in their systematic evaluation and their ability to effectively communicate ideologies. Addressing this, we introduce MemeCraft, an innovative meme generator that leverages large language models (LLMs) and visual language models (VLMs) to produce memes advocating specific social movements. MemeCraft presents an end-to-end pipeline, transforming user prompts into compelling multimodal memes without manual intervention. Conscious of the misuse potential in creating divisive content, an intrinsic safety mechanism is embedded to curb hateful meme production. Our assessment, focusing on two UN Sustainable Development Goals-Climate Action and Gender Equality-shows MemeCraft's prowess in creating memes that are both funny and supportive of advocacy goals. This paper highlights how generative AI can promote social good and pioneers the use of LLMs and VLMs in meme generation.

show abstract

Section: Related Work 21 Meme Anaylsis and Generationmentioning

confidence: 99%

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Wang,

Lee

2024

Proceedings of the ACM Web Conference 2024

Self Cite

View full text Add to dashboard Cite

show abstract

“…CLIP is a novel architecture that integrates computer vision and natural language processing. Its architecture is designed in a way that both the visual image and the text caption in a multimodal meme can be analyzed simultaneously to extract the text and image embeddings [22][23][24]. A text encoder and an image encoder are the two primary parts of its architecture.…”

Section: Clip Model Architecturementioning

confidence: 99%

Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

Arya,

Hasan,

Bagwari

et al. 2024

IEEE Access

View full text Add to dashboard Cite

In contemporary society, the proliferation of online hateful messages has emerged as a pressing concern, inflicting deleterious consequences on both societal fabric and individual well-being. The automatic detection of such malevolent content online using models designed to recognize it, holds promise in mitigating its harmful impact. However, the advent of "Hateful Memes" poses fresh challenges to the detection paradigm, particularly within the realm of deep learning models. These memes, constituting of a textual element associated with an image are individually innocuous but their combination causes a detrimental effect. Consequently, entities responsible for disseminating information via web browsers are compelled to institute mechanisms that regulate and automatically filter out such injurious content. Effectively identifying hateful memes demands algorithms and models endowed with robust vision and language fusion capabilities, capable of reasoning across diverse modalities. This research introduces a novel approach by leveraging the multimodal Contrastive Language-Image Pre-Training (CLIP) model, fine-tuned through the incorporation of prompt engineering. This innovative methodology achieves a commendable accuracy of 87.42%. Comprehensive metrics such as loss, AUROC, and f1 score are also meticulously computed, corroborating the efficacy of the proposed strategy. Our findings suggest that this approach presents an efficient means to regulate the dissemination of hate speech in the form of viral meme content across social networking platforms, thereby contributing to a safer online environment.

show abstract

“…Existing studies have explored classic twostream models that combine the text and visual features learned from text and image encoders using attention-based mechanisms and other fusion methods to perform hateful meme classification (Zhang et al, 2020;Kiela et al, 2020;Suryawanshi et al, 2020). Another popular line of approach is finetuning large scale pre-trained multimodal models for the task (Lippe et al, 2020;Zhu, 2020;Zhou and Chen, 2020;Muennighoff, 2020;Velioglu and Rose, 2020;Pramanick et al, 2021b;Hee et al, 2022). Recent studies have also attempted to use data augmentation (Zhu, 2020;Zhou and Chen, 2020;Zhu et al, 2022) and ensemble methods (Zhu, 2020;Velioglu and Rose, 2020;Sandulescu, 2020) to enhance the hateful memes classification performance.…”

Section: Hateful Meme Detectionmentioning

confidence: 99%

Prompting for Multimodal Hateful Meme Classification

Cao¹,

Lee²,

Chong³

et al. 2023

Preprint

View full text Add to dashboard Cite

Hateful meme classification is a challenging multimodal task that requires complex reasoning and contextual background knowledge. Ideally, we could leverage an explicit external knowledge base to supplement contextual and cultural information in hateful memes. However, there is no known explicit external knowledge base that could provide such hate speech contextual information. To address this gap, we propose PromptHate, a simple yet effective prompt-based model that prompts pre-trained language models (PLMs) for hateful meme classification. Specifically, we construct simple prompts and provide a few in-context examples to exploit the implicit knowledge in the pre-trained RoBERTa language model for hateful meme classification. We conduct extensive experiments on two publicly available hateful and offensive meme datasets. Our experimental results show that PromptHate is able to achieve a high AUC of 90.96, outperforming state-of-the-art baselines on the hateful meme classification task. We also perform fine-grained analyses and case studies on various prompt settings and demonstrate the effectiveness of the prompts on hateful meme classification.

show abstract

On Explaining Multimodal Hateful Meme Detection Models

Cited by 17 publications

References 29 publications

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

Prompting for Multimodal Hateful Meme Classification

Contact Info

Product

Resources

About