MM-Claims: A Dataset for Multimodal Claim Detection in Social Media

Cheema, Gullal S.; Hakimov, Sherzod; Sittar, Abdul; Müller-Budack, Eric; Otto, Christian; Ewerth, Ralph

doi:10.18653/v1/2022.findings-naacl.72

Cited by 7 publications

(4 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Check square [45] builds a KD-tree based on embeddings for claim texts and titles to retrieve the 1,000 most similar verified claims for an input claim. We report results for two of their variants which performed better than their primary submission: one using Sentence-BERT-Large pre-trained on SNLI with MAX tokens, fine-tuned with triplet loss, and the other one using multilingual DistilBERT-embeddings without fine-tuning (distmult).…”

Section: Claim Retrieval Methods and Baselinesmentioning

confidence: 99%

“…Our results generally verify the observations made in the literature that the choice of language models for this task greatly impacts the results: For Check square, the authors found on the 2021-tweets data that multilingual DistilBERT embeddings without fine-tuning outperformed fine-tuned monolingual Sentence-BERT models. For the former, fine-tuning using their triplet loss methodology hurts performance while it helps in the case of Sentence-BERT [45]. Similarly, [39] found on the same dataset that their RoBERTa model fine-tuned on triplets performed worse than monolingual DistilBERT without fine-tuning.…”

Section: Performance On Different Datasetsmentioning

confidence: 98%

See 1 more Smart Citation

Robust and Efficient Claim Retrieval for Online Fact-Checking Applications

Boland,

Hövelmeyer,

Fafalios

et al. 2023

Preprint

View full text Add to dashboard Cite

Understanding the veracity of statements is important when consuming information on the Web. Whereas fact-checking sites have provided a large corpus of already verified claims, matching a given utterance to already fact-checked claims remains a challenging task. Verified claim retrieval has been approached through a variety of different methods, among them approaches relying on supervised neural models. Whereas such models tend to perform strongly, they require significant training effort and their robustness towards unseen data distributions may vary heavily. Also, prior works demonstrate the capability of unsupervised models to provide state-of-the-art performance.In this paper, we assess established claim retrieval benchmark datasets and experimentally evaluate and compare different state-of-the-art supervised and unsupervised methods with regard to performance, but also computational effort and run time. We show that unsupervised approaches outperform supervised ones with respect to robustness. While the best state-of-the-art method relies on supervised deep neural networks, its high computational costs make it difficult to use in online fact-checking applications. The best unsupervised method reaches a similar performance and meets efficiency requirements of online application scenarios due to low hardware requirements. Our experiments verify that, due to the nature of the task and data, the choice of pre-trained language models is more important than fine-tuning and that training supervised models on the target data may not be cost-efficient in online claim retrieval applications.

show abstract

Section: Claim Retrieval Methods and Baselinesmentioning

confidence: 99%

Section: Performance On Different Datasetsmentioning

confidence: 98%

Robust and Efficient Claim Retrieval for Online Fact-Checking Applications

Boland,

Hövelmeyer,

Fafalios

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The term fauxtography was first coined in journalism for images manipulated to "convey a questionable (or outright false) sense of the events they seem to depict" (Cooper, 2007;Kalb and Saivetz, 2007). Other terms used in the literature for manipulated content are fake (Cheema et al, 2022), forgery (Cozzolino et al, 2021), and splice (Zampoglou et al, 2015).…”

Section: Task Formulationmentioning

confidence: 99%

“…Claim detection is typically framed as a classification task. Models predict if a claim is checkable or check-worthy (Prabhakar et al, 2021;Cheema et al, 2022;Barrón-Cedeño et al, 2023). The verdict for factual-verifiability is often binary (Jin et al, 2017;Shang et al, 2021).…”

Section: Stage 1: Claim Detection and Extractionmentioning

confidence: 99%

Multimodal Automated Fact-Checking: A Survey

Akhtar,

Schlichtkrull,

Guo

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned image. Multimodal misinformation is perceived as more credible by humans, and spreads faster than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on text. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terms used in different communities and map them to our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future research.

show abstract