2021
DOI: 10.1007/978-3-030-92307-5_21
|View full text |Cite
|
Sign up to set email alerts
|

FiLMing Multimodal Sarcasm Detection with Attention

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Noticing this issue, nowadays the research interests have shifted to exploring the task of multimodal sarcasm detection (MSD), whose key objective is to accurately detect the inter-and intra-modal incongruities of someone's implied sentiment expression within the given context. Early approaches incorporated fusion techniques that combined entire text and image features by concatenating operation (Pan et al 2020) or attention mechanism (Gupta et al 2021). Despite their considerable progress, they overlook the possibility that sarcastic information may be expressed in some local segments of the text and certain regions of the image.…”
Section: Sota Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Noticing this issue, nowadays the research interests have shifted to exploring the task of multimodal sarcasm detection (MSD), whose key objective is to accurately detect the inter-and intra-modal incongruities of someone's implied sentiment expression within the given context. Early approaches incorporated fusion techniques that combined entire text and image features by concatenating operation (Pan et al 2020) or attention mechanism (Gupta et al 2021). Despite their considerable progress, they overlook the possibility that sarcastic information may be expressed in some local segments of the text and certain regions of the image.…”
Section: Sota Modelmentioning
confidence: 99%
“…Textual Encoding. To better model the semantic information in the textual sentence, we feed it into the pre-trained language encoder RoBERTa (Liu et al 2019), which has gained appreciative results in multimodal language understanding tasks (Cao et al 2022;Gupta et al 2021),…”
Section: Msd Model Initializationmentioning
confidence: 99%
“…Arevalo et al [43] propose the Gated Multimodal Unit (GMU) model, which controls the influence of input modalities on unit activation levels for data fusion. Gupta et al [44] introduce a Collaborative Attention Model based on RoBERTa and FiLMed ResNet, addressing the issue of visual-text inconsistency through joint attention mechanisms.…”
Section: B Multimodal Sentiment Analysismentioning
confidence: 99%
“…Gupta proposed FiLM, which uses FiLMed ResNet blocks to modulate input image and text features to integrate feature affine transformations (FiLM) for capturing multimodal information. The model connects the output of the CLS token from RoBERTa for the final prediction [16].…”
Section: Multi-modal Sarcasm Detectionmentioning
confidence: 99%