Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.72
|View full text |Cite
|
Sign up to set email alerts
|

MM-Claims: A Dataset for Multimodal Claim Detection in Social Media

Abstract: In recent years, the problem of misinformation on the web has become widespread across languages, countries, and various social media platforms. Although there has been much work on automated fake news detection, the role of images and their variety are not well explored. In this paper, we investigate the roles of image and text at an earlier stage of the fake news detection pipeline, called claim detection. For this purpose, we introduce a novel dataset, MM-Claims, which consists of tweets and corresponding i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 7 publications
0
2
0
Order By: Relevance
“…Check square [45] builds a KD-tree based on embeddings for claim texts and titles to retrieve the 1,000 most similar verified claims for an input claim. We report results for two of their variants which performed better than their primary submission: one using Sentence-BERT-Large pre-trained on SNLI with MAX tokens, fine-tuned with triplet loss, and the other one using multilingual DistilBERT-embeddings without fine-tuning (distmult).…”
Section: Claim Retrieval Methods and Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Check square [45] builds a KD-tree based on embeddings for claim texts and titles to retrieve the 1,000 most similar verified claims for an input claim. We report results for two of their variants which performed better than their primary submission: one using Sentence-BERT-Large pre-trained on SNLI with MAX tokens, fine-tuned with triplet loss, and the other one using multilingual DistilBERT-embeddings without fine-tuning (distmult).…”
Section: Claim Retrieval Methods and Baselinesmentioning
confidence: 99%
“…Our results generally verify the observations made in the literature that the choice of language models for this task greatly impacts the results: For Check square, the authors found on the 2021-tweets data that multilingual DistilBERT embeddings without fine-tuning outperformed fine-tuned monolingual Sentence-BERT models. For the former, fine-tuning using their triplet loss methodology hurts performance while it helps in the case of Sentence-BERT [45]. Similarly, [39] found on the same dataset that their RoBERTa model fine-tuned on triplets performed worse than monolingual DistilBERT without fine-tuning.…”
Section: Performance On Different Datasetsmentioning
confidence: 98%
“…The term fauxtography was first coined in journalism for images manipulated to "convey a questionable (or outright false) sense of the events they seem to depict" (Cooper, 2007;Kalb and Saivetz, 2007). Other terms used in the literature for manipulated content are fake (Cheema et al, 2022), forgery (Cozzolino et al, 2021), and splice (Zampoglou et al, 2015).…”
Section: Task Formulationmentioning
confidence: 99%
“…Claim detection is typically framed as a classification task. Models predict if a claim is checkable or check-worthy (Prabhakar et al, 2021;Cheema et al, 2022;Barrón-Cedeño et al, 2023). The verdict for factual-verifiability is often binary (Jin et al, 2017;Shang et al, 2021).…”
Section: Stage 1: Claim Detection and Extractionmentioning
confidence: 99%