Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.379
|View full text |Cite
|
Sign up to set email alerts
|

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Abstract: Internet memes have become powerful means to transmit political, psychological, and sociocultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes such as hate speech and propaganda, there has been little work on harm in general. Here, we aim to bridge this gap. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 51 publications
(71 citation statements)
references
References 39 publications
0
51
0
Order By: Relevance
“…Since the relative importance of the two branches depends upon the structure of the input image, we attentively fuse the CLS tokens from the last layer of each branch. Motivated by [28,29], we design our attention module with two major parts -modality attention generation and weighted concatenation. In the first part, a sequence of dense layers followed by a softmax layer is used to generate the attention scores w mm = [w rgb , w seg ] for the two branches.…”
Section: Semantic Segmentation For Robustness To Appearance Variationmentioning
confidence: 99%
“…Since the relative importance of the two branches depends upon the structure of the input image, we attentively fuse the CLS tokens from the last layer of each branch. Motivated by [28,29], we design our attention module with two major parts -modality attention generation and weighted concatenation. In the first part, a sequence of dense layers followed by a softmax layer is used to generate the attention scores w mm = [w rgb , w seg ] for the two branches.…”
Section: Semantic Segmentation For Robustness To Appearance Variationmentioning
confidence: 99%
“…Additionally, the workaround flagging harmful content has focused majorly on text-based features as they are easier to collect. Meanwhile, the usage of memes and videos (short clips and long ones) spreading toxic and harmful content has been gaining momentum [43,63,64]. We need to study the impact of bias in multi-modal content.…”
Section: Case Study: Shift In Bias Due To Knowledge-based Generalizat...mentioning
confidence: 99%
“…Researchers explored the online content from social media even further and began focusing on the multi-modal data [27,28], including internet memes. Efforts to automatically detect the offensive [29] or harmful memes [30] are being made to help the content moderators in charge of removing the posts containing hate speech.…”
Section: Related Workmentioning
confidence: 99%
“…Several previous methods have reported using multi-modal approaches in the computational pipeline [27,28,29,30]. In our pipeline, however, we explore semantic image features in two ways: i) direct image features provided by a pretrained EfficientNetV4 [48] on ImageNet dataset [51], and ii) features from the image encoder of CLIP [49].…”
Section: Multi-modal-multi-task Transformer (Mmmt)mentioning
confidence: 99%