MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Pramanick, Shraman; Sharma, Shivam; Dimitrov, Dimitar; Akhtar, Md. Shad; Nakov, Preslav; Chakraborty, Tanmoy

doi:10.18653/v1/2021.findings-emnlp.379

Cited by 51 publications

(71 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the relative importance of the two branches depends upon the structure of the input image, we attentively fuse the CLS tokens from the last layer of each branch. Motivated by [28,29], we design our attention module with two major parts -modality attention generation and weighted concatenation. In the first part, a sequence of dense layers followed by a softmax layer is used to generate the attention scores w mm = [w rgb , w seg ] for the two branches.…”

Section: Semantic Segmentation For Robustness To Appearance Variationmentioning

confidence: 99%

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Pramanick¹,

Nowara²,

Gleason³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets -Im2GPS [13], Im2GPS3k [14], YFCC4k [50], YFCC26k [43] and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

show abstract

Section: Semantic Segmentation For Robustness To Appearance Variationmentioning

confidence: 99%

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Pramanick¹,

Nowara²,

Gleason³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Additionally, the workaround flagging harmful content has focused majorly on text-based features as they are easier to collect. Meanwhile, the usage of memes and videos (short clips and long ones) spreading toxic and harmful content has been gaining momentum [43,63,64]. We need to study the impact of bias in multi-modal content.…”

Section: Case Study: Shift In Bias Due To Knowledge-based Generalizat...mentioning

confidence: 99%

Handling Bias in Toxic Speech Detection: A Survey

Garg¹,

Masud²,

Suresh³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The massive growth of social media usage has witnessed a tsunami of online toxicity in teams of hate speech, abusive posts, cyberbullying, etc. Detecting online toxicity is challenging due to its inherent subjectivity. Factors such as the context of the speech, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can lead to a sidelining of the various demographic and psychographic groups they aim to help in the first place. It has piqued researchers' interest in examining unintended biases and their mitigation. Due to the nascent and multi-faceted nature of the work, complete literature is chaotic in its terminologies, techniques, and findings. In this paper, we put together a systematic study to discuss the limitations and challenges of existing methods.We start by developing a taxonomy for categorising various unintended biases and a suite of evaluation metrics proposed to quantify such biases. We take a closer look at each proposed method for evaluating and mitigating bias in toxic speech detection. To examine the limitations of existing methods, we also conduct a case study to introduce the concept of bias shift due to knowledge-based bias mitigation methods. The survey concludes with an overview of the critical challenges, research gaps and future directions.While reducing toxicity on online platforms continues to be an active area of research, a systematic study of various biases and their mitigation strategies will help the research community produce robust and fair models.

show abstract

“…Researchers explored the online content from social media even further and began focusing on the multi-modal data [27,28], including internet memes. Efforts to automatically detect the offensive [29] or harmful memes [30] are being made to help the content moderators in charge of removing the posts containing hate speech.…”

Section: Related Workmentioning

confidence: 99%

“…Several previous methods have reported using multi-modal approaches in the computational pipeline [27,28,29,30]. In our pipeline, however, we explore semantic image features in two ways: i) direct image features provided by a pretrained EfficientNetV4 [48] on ImageNet dataset [51], and ii) features from the image encoder of CLIP [49].…”

Section: Multi-modal-multi-task Transformer (Mmmt)mentioning

confidence: 99%

BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer

Bucur¹,

Cosma²,

Ioan-Bogdan³

2022

Preprint

View full text Add to dashboard Cite

Memes are prevalent on the internet and continue to grow and evolve alongside our culture. An automatic understanding of memes propagating on the internet can shed light on the general sentiment and cultural attitudes of people. In this work, we present team BLUE's solution for the second edition of the MEMOTION shared task. We showcase two approaches for meme classification (i.e. sentiment, humour, offensive, sarcasm and motivation levels) using a text-only method using BERT, and a Multi-Modal-Multi-Task transformer network that operates on both the meme image and its caption to output the final scores. In both approaches, we leverage state-of-the-art pretrained models for text (BERT, Sentence Transformer) and image processing (EfficientNetV4, CLIP). Through our efforts, we obtain first place in task A, second place in task B and third place in task C. In addition, our team obtained the highest average score for all three tasks.

show abstract

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Cited by 51 publications

References 39 publications

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Handling Bias in Toxic Speech Detection: A Survey

BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer

Contact Info

Product

Resources

About