AIDA-UPM at SemEval-2022 Task 5: Exploring Multimodal Late Information Fusion for Multimedia Automatic Misogyny Identification

Huertas-García, Álvaro; Liz, Helena; Villar-Rodríguez, Guillermo; Martı́n, Alejandro; Huertas‐Tato, Javier; Camacho, David

doi:10.18653/v1/2022.semeval-1.107

Cited by 2 publications

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed model uses the bilinear interaction layer to fuse the image and text features. Huertas et al [40] explored several late fusion techniques in order to enhance the effectiveness of the fusion approach utilising a model based on transformers and convolutional neural networks (CNNs). The proposed method encompasses a multimodal strategy that integrates several attributes (such as logits, probabilities, and embeddings) derived from both textual and visual elements, employing a late fusion technique.…”

Section: Related Workmentioning

confidence: 99%

DeVi Deep Learning Framework for Misogyny Identification in Multimodal Data

Singh,

Das,

Manderna

et al. 2023

Preprint

View full text Add to dashboard Cite

In recent times, there has been a notable upsurge in the frequency of memes across a wide range of social media platforms. Memes provide amusement to people with their humour, but unfortunately, some memes exploit this humour as a cover to spread misogynistic and hateful content targeting women on online platforms. Most of the previously proposed methods for detecting misogyny have primarily concentrated on either textual or visual content. However, there is a noticeable dearth of research on analysing multimodal data that combines both images and text. We propose a DeVi framework comprising DeBERTa and Vision Transformer with an attention-based late fusion strategy for automatic misogyny identification in memes. We evaluated the proposed framework on two different subtasks provided in SemEval-2022 task 5 on the MAMI dataset. Subtask A is a misogynous meme identification task, and subtask B is to identify the type of misogyny, which is a multilabel classification task. The proposed framework achieved an F1-score of 0.865 and 0.783 on subtask A and B, respectively. The experimental findings clearly illustrate that the DeVi framework we propose outperforms existing multimodal models in both subtasks, showcasing its superior performance. This highlights the effectiveness and adaptability of the DeVi framework.

show abstract

Section: Related Workmentioning

confidence: 99%