Proceedings of the ACM Web Conference 2022 2022
DOI: 10.1145/3485447.3512260
|View full text |Cite
|
Sign up to set email alerts
|

On Explaining Multimodal Hateful Meme Detection Models

Abstract: Hateful meme detection is a new multimodal task that has gained significant traction in academic and industry research communities. Recently, researchers have applied pre-trained visual-linguistic models to perform the multimodal classification task, and some of these solutions have yielded promising results. However, what these visual-linguistic models learn for the hateful meme classification task remains unclear. For instance, it is unclear if these models are able to capture the derogatory or slurs referen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…Detection of hateful memes is crucial due to their potential misuse for spreading harmful messages [12,20], misinformation 2 https://github.com/Social-AI-Studio/MemeCraft [24,29], and propaganda [5]. Efforts to develop models for detecting harmful memes have intensified in academia and industry [1,2,8,9,13,16,30,38,52].…”
Section: Related Work 21 Meme Anaylsis and Generationmentioning
confidence: 99%
“…Detection of hateful memes is crucial due to their potential misuse for spreading harmful messages [12,20], misinformation 2 https://github.com/Social-AI-Studio/MemeCraft [24,29], and propaganda [5]. Efforts to develop models for detecting harmful memes have intensified in academia and industry [1,2,8,9,13,16,30,38,52].…”
Section: Related Work 21 Meme Anaylsis and Generationmentioning
confidence: 99%
“…CLIP is a novel architecture that integrates computer vision and natural language processing. Its architecture is designed in a way that both the visual image and the text caption in a multimodal meme can be analyzed simultaneously to extract the text and image embeddings [22][23][24]. A text encoder and an image encoder are the two primary parts of its architecture.…”
Section: Clip Model Architecturementioning
confidence: 99%
“…Existing studies have explored classic twostream models that combine the text and visual features learned from text and image encoders using attention-based mechanisms and other fusion methods to perform hateful meme classification (Zhang et al, 2020;Kiela et al, 2020;Suryawanshi et al, 2020). Another popular line of approach is finetuning large scale pre-trained multimodal models for the task (Lippe et al, 2020;Zhu, 2020;Zhou and Chen, 2020;Muennighoff, 2020;Velioglu and Rose, 2020;Pramanick et al, 2021b;Hee et al, 2022). Recent studies have also attempted to use data augmentation (Zhu, 2020;Zhou and Chen, 2020;Zhu et al, 2022) and ensemble methods (Zhu, 2020;Velioglu and Rose, 2020;Sandulescu, 2020) to enhance the hateful memes classification performance.…”
Section: Hateful Meme Detectionmentioning
confidence: 99%