COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic

Saakyan, Arkadiy; Chakrabarty, Tuhin; Muresan, Smaranda

doi:10.18653/v1/2021.acl-long.165

Cited by 37 publications

(53 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, these systems evaluate whether the retrieved evidence sentences validate or contradict the claim, or whether there is not enough information to make a judgment. More recently, the SCIFACT (Wadden et al 2020) and COVIDFACT (Saakyan, Chakrabarty, and Muresan 2021) benchmarks re-purposed this framework for the sci-entific domain by releasing datasets of medical claims to be verified against scientific content (Wang et al 2020). While this framework has led to impressive advances in fact verification performance (Ye et al 2020;Pradeep et al 2021), current benchmarks assume that the available evidence database contains only valid, factual information.…”

Section: Claim Verificationmentioning

confidence: 99%

“…During document retrieval, documents in the evidence repository that are relevant to the claim are selected. Existing methods typically use information retrieval methods to rank documents based on relevance Wadden et al 2020) or use public APIs of commercial document indices (Hanselowski et al 2019;Saakyan, Chakrabarty, and Muresan 2021) to crawl related documents. In the sentence retrieval stage, individual sentences from these retrieved documents are selected with respect to their relevance to the claim, often using textual entailment (Hanselowski et al 2019), or sentence similarity methods.…”

Section: Evidence Retrievalmentioning

confidence: 99%

“…This paper relies on the existing FEVER, SCIFACT, and COVIDFACT datasets, which are publicly available. To test our method, we use the same evaluation metrics proposed by the dataset authors: label accuracy for FEVER , the sentence selection, sentence label, abstract label, and abstract rationalized metrics for SCIFACT (Wadden et al 2020), and the Top-1 and Top-5 label accuracy for COVIDFACT (Saakyan, Chakrabarty, and Muresan 2021). We also introduce our own datasets of adversarial evidence generated by GROVER and PEGASUS (Zhang et al 2019).…”

Section: Reproducibility Detailsmentioning

confidence: 99%

See 2 more Smart Citations

Synthetic Disinformation Attacks on Automated Fact Verification Systems

Du¹,

Bosselut²,

Manning³

2022

Preprint

View full text Add to dashboard Cite

Automated fact-checking is a needed technology to curtail the spread of online misinformation. One current framework for such solutions proposes to verify claims by retrieving supporting or refuting evidence from related textual sources. However, the realistic use cases for fact-checkers will require verifying claims against evidence sources that could be affected by the same misinformation. Furthermore, the development of modern NLP tools that can produce coherent, fabricated content would allow malicious actors to systematically generate adversarial disinformation for fact-checkers. In this work, we explore the sensitivity of automated factcheckers to synthetic adversarial evidence in two simulated settings: ADVERSARIAL ADDITION, where we fabricate documents and add them to the evidence repository available to the fact-checking system, and ADVERSARIAL MODIFICATION, where existing evidence source documents in the repository are automatically altered. Our study across multiple models on three benchmarks demonstrates that these systems suffer significant performance drops against these attacks. Finally, we discuss the growing threat of modern NLG systems as generators of disinformation in the context of the challenges they pose to automated fact-checkers.

show abstract

Section: Claim Verificationmentioning

confidence: 99%

Section: Evidence Retrievalmentioning

confidence: 99%

Section: Reproducibility Detailsmentioning

confidence: 99%

See 1 more Smart Citation

Synthetic Disinformation Attacks on Automated Fact Verification Systems

Du¹,

Bosselut²,

Manning³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Naturally, search based on ambiguous claims can yield poor quality search results, and thus insufficient evidence if included in a dataset to facilitate research on evidence-based fact-checking. To the best of our knowledge, this has not been considered with most existing datasets [28,1,19,23,3], though recent work on fact checking related to COVID-19 did usefully evaluate pipeline systems using Google as a baseline engine [21].…”

Section: Claim Ambiguitymentioning

confidence: 99%

“…The rise of misinformation has also prompted a great body of work, especially in natural language processing (NLP), on the automatic fact checking of claims [24,20,3,11,26,10,21]. Despite tremendous progress, however, the task remains quite challenging.…”

Section: Introductionmentioning

confidence: 99%

The Case for Claim Difficulty Assessment in Automatic Fact Checking

Singh¹,

Das²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Fact-checking is the process (human, automated, or hybrid) by which claims (i.e., purported facts) are evaluated for veracity. In this article, we raise an issue that has received little attention in prior work -that some claims are far more difficult to fact-check than others. We discuss the implications this has for both practical fact-checking and research on automated fact-checking, including task formulation and dataset design. We report a manual analysis undertaken to explore factors underlying varying claim difficulty and categorize several distinct types of difficulty. We argue that prediction of claim difficulty is a missing component of today's automated fact checking architectures, and we describe how this difficulty prediction task might be split into a set of distinct subtasks.

show abstract