Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2022
DOI: 10.18653/v1/2022.semeval-1.107
|View full text |Cite
|
Sign up to set email alerts
|

AIDA-UPM at SemEval-2022 Task 5: Exploring Multimodal Late Information Fusion for Multimedia Automatic Misogyny Identification

Abstract: This paper describes the multimodal late fusion model proposed in the SemEval-2022 Multimedia Automatic Misogyny Identification (MAMI) task. The main contribution of this paper is the exploration of different late fusion methods to boost the performance of the combination based on the Transformer-based model and Convolutional Neural Networks (CNNs) for text and image, respectively. Additionally, our findings contribute to a better understanding of the effects of different image preprocessing methods for meme c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…The proposed model uses the bilinear interaction layer to fuse the image and text features. Huertas et al [40] explored several late fusion techniques in order to enhance the effectiveness of the fusion approach utilising a model based on transformers and convolutional neural networks (CNNs). The proposed method encompasses a multimodal strategy that integrates several attributes (such as logits, probabilities, and embeddings) derived from both textual and visual elements, employing a late fusion technique.…”
Section: Related Workmentioning
confidence: 99%
“…The proposed model uses the bilinear interaction layer to fuse the image and text features. Huertas et al [40] explored several late fusion techniques in order to enhance the effectiveness of the fusion approach utilising a model based on transformers and convolutional neural networks (CNNs). The proposed method encompasses a multimodal strategy that integrates several attributes (such as logits, probabilities, and embeddings) derived from both textual and visual elements, employing a late fusion technique.…”
Section: Related Workmentioning
confidence: 99%