2020
DOI: 10.1162/tacl_a_00338
|View full text |Cite
|
Sign up to set email alerts
|

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension

Abstract: Innovations in annotation methodology have been a catalyst for Reading Comprehension (RC) datasets and models. One recent trend to challenge current RC models is to involve a model in the annotation process: Humans create questions adversarially, such that the model fails to answer them correctly. In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to expl… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
67
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 80 publications
(80 citation statements)
references
References 34 publications
(41 reference statements)
3
67
0
1
Order By: Relevance
“…Creating model-fooling examples is not as easy as it used to be, and finding interesting examples is rapidly becoming a less trivial task. In ANLI, the verified model error rate for crowd workers in the later rounds went below 1-in-10 ( , while in "Beat the AI", human performance decreased while time per valid adversarial example went up with stronger models in the loop (Bartolo et al, 2020). For expert linguists, we expect the model error to be much higher, but if the platform actually lives up to its virtuous cycle promise, that error rate will go down quickly.…”
Section: Dynabenchmentioning
confidence: 93%
See 4 more Smart Citations
“…Creating model-fooling examples is not as easy as it used to be, and finding interesting examples is rapidly becoming a less trivial task. In ANLI, the verified model error rate for crowd workers in the later rounds went below 1-in-10 ( , while in "Beat the AI", human performance decreased while time per valid adversarial example went up with stronger models in the loop (Bartolo et al, 2020). For expert linguists, we expect the model error to be much higher, but if the platform actually lives up to its virtuous cycle promise, that error rate will go down quickly.…”
Section: Dynabenchmentioning
confidence: 93%
“…Research progress has traditionally been driven by a cyclical process of resource collection and architectural improvements. Similar to Dynabench, recent work seeks to embrace this phenomenon, addressing many of the previously mentioned issues through an iterative human-and-model-in-the-loop annotation process (Yang et al, 2017;Dinan et al, 2019;Chen et al, 2019;Bartolo et al, 2020;, to find "unknown unknowns" (Attenberg et al, 2015) or in a never-ending or life-long learning setting (Silver et al, 2013;Mitchell et al, 2018). The Adversarial NLI (ANLI) dataset , for example, was collected with an adversarial setting over multiple rounds to yield "a 'moving post' dynamic target for NLU systems, rather than a static benchmark that will eventually saturate".…”
Section: Adversarial Training and Testingmentioning
confidence: 99%
See 3 more Smart Citations