Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER) 2021
DOI: 10.18653/v1/2021.fever-1.1
|View full text |Cite
|
Sign up to set email alerts
|

The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) Shared Task

Abstract: The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) shared task, asks participating systems to determine whether human-authored claims are SUPPORTED or REFUTED based on evidence retrieved from Wikipedia (or NOTENOUGHINFO if the claim cannot be verified). Compared to the FEVER 2018 shared task, the main challenge is the addition of structured data (tables and lists) as a source of evidence. The claims in the FEVEROUS dataset can be verified using only structured evidence… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 50 publications
(64 citation statements)
references
References 23 publications
(12 reference statements)
0
40
0
Order By: Relevance
“…• Structured Knowledge Grounding -We use several component tasks from UnifiedSKG (Xie et al, 2022), namely WikiTQ (Pasupat & Liang, 2015), CompWQ , FetaQA (Nan et al, 2021), HybridQA , WikiSQL (Zhong et al, 2017), TabFat , Feverous (Aly et al, 2021), SQA (Iyyer et al, 2017), MTOP and DART (Nan et al, 2020). We select datasets that are relatively convenient to perform evaluation and uses mainstream metrics such as accuracy or exact match instead of obscure ones or those that require significant domain specific post-processing.…”
Section: Datasets For Supervised Finetuningmentioning
confidence: 99%
“…• Structured Knowledge Grounding -We use several component tasks from UnifiedSKG (Xie et al, 2022), namely WikiTQ (Pasupat & Liang, 2015), CompWQ , FetaQA (Nan et al, 2021), HybridQA , WikiSQL (Zhong et al, 2017), TabFat , Feverous (Aly et al, 2021), SQA (Iyyer et al, 2017), MTOP and DART (Nan et al, 2020). We select datasets that are relatively convenient to perform evaluation and uses mainstream metrics such as accuracy or exact match instead of obscure ones or those that require significant domain specific post-processing.…”
Section: Datasets For Supervised Finetuningmentioning
confidence: 99%
“…In this paper, we evaluate non-committal answers such as "No comment" or "I don't know" as true, even when there's a sense in which the model "knows" a true answer. 1 It follows from our definition that a model counts as perfectly truthful if it answers "No comment" for every question. In practice we want answers that are both truthful and informative (i.e.…”
Section: Defining the Truthfulness Objectivementioning
confidence: 99%
“…Truthfulness is relevant to many applications include generating news stories [22], summarization [12,28,40,45], conversational dialog [38,36], and question answering [10,23,25,27]. A related line of research is automated fact-checking [43,1,2], where the focus is on evaluation of statements rather than generation.…”
Section: Related Workmentioning
confidence: 99%
“…Fact-checking Vlachos and Riedel (2014) proposed to decompose the fact-checking process into three components: identifying check-worthy claims, retrieving evidence, and producing verdicts. Various datasets have been proposed, including human-generated claims based on Wikepedia (Thorne et al, 2018;Chen et al, 2019;Jiang et al, 2020;Schuster et al, 2021;Aly et al, 2021), real-world political claims (Wang, 2017;Alhindi et al, 2018;Augenstein et al, 2019;Ostrowski Here, "support" denotes the evidence that supports the hypothesis, "refute" denotes the evidence that refutes the hypothesis (with the negated hypothesis); "merged" denotes the performance of combining "support" and "refute" (after removing duplicates then taking Top-K paragraphs).…”
Section: Related Workmentioning
confidence: 99%