Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) 2018
DOI: 10.18653/v1/w18-5501
|View full text |Cite
|
Sign up to set email alerts
|

The Fact Extraction and VERification (FEVER) Shared Task

Abstract: We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be SUPPORTED or REFUTED using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER score of 64.21%. In this paper, we present the results of the shared task and a summary of the systems, highlighting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
166
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 183 publications
(206 citation statements)
references
References 14 publications
(18 reference statements)
1
166
0
Order By: Relevance
“…To aid with preparing their submission of 1000 instances, the organizers hosted a webbased sandbox. Breakers had access to 8 systems (4 top systems from the first FEVER shared task (Thorne et al, 2018b), the baseline from (Thorne et al, 2018a) and 3 new qualifying submissions from the 'Build-It' phase) that were hosted by the shared task organisers. Participants could experiment with attacks by submitting small samples of 50 instances for scoring twice a day via a shared task portal which returned FEVER scores of all the hosted systems.…”
Section: Task Phasesmentioning
confidence: 99%
See 1 more Smart Citation
“…To aid with preparing their submission of 1000 instances, the organizers hosted a webbased sandbox. Breakers had access to 8 systems (4 top systems from the first FEVER shared task (Thorne et al, 2018b), the baseline from (Thorne et al, 2018a) and 3 new qualifying submissions from the 'Build-It' phase) that were hosted by the shared task organisers. Participants could experiment with attacks by submitting small samples of 50 instances for scoring twice a day via a shared task portal which returned FEVER scores of all the hosted systems.…”
Section: Task Phasesmentioning
confidence: 99%
“…The first Fact Extraction and VERification (FEVER) shared task (Thorne et al, 2018b) focused on building systems that predict whether a textual claim is SUPPORTED or REFUTED given evidence (see (Thorne et al, 2018a) for a task description), or NOTENOUGHINFORMATION in case Wikipedia does not have appropriate evidence to verify it. As automated systems for fact checking have potentially sensitive applications it is important to study the vulnerabilities of these systems, as well as the deficiencies of the datasets they are trained on.…”
Section: Introductionmentioning
confidence: 99%
“…In the final component, we classify each of the extracted candidate evidence in terms of whether they support, refute or are just related to the claim (other). We employ the natural language inference (NLI) model from the Hexa-F system [15] (one of the best performing systems in the FEVER shared task [12]) to classify the relation between the selected evidence sentences and the claim, one of supports/refutes/other, and a similar label which expresses whether the combined set of evidence sentences supports, refutes or is simply related to the claim.…”
Section: Fact Checking Systemmentioning
confidence: 99%
“…Claim Validation Reasoning about the validity of a particular claim can be separated into three sub-tasks: document retrieval to find documents related to the claim, ED to find the relevant pieces of evidence that support or contradict the claim, and Textual Entailment (TE) to determine whether the claim follows from the evidence. The FEVER shared tasks follows this approach (Thorne et al, 2018;Thorne and Vlachos, 2019). Other approaches, such as TwoWingOS (Yin and Roth, 2018) and DeClarE (Popat et al, 2018) combine the ED and TE models into a single end-toend method.…”
Section: Related Workmentioning
confidence: 99%