Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.441
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial NLI: A New Benchmark for Natural Language Understanding

Abstract: We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-theart models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

11
470
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 462 publications
(553 citation statements)
references
References 41 publications
11
470
0
1
Order By: Relevance
“…(Rajpurkar et al, 2016) and Winograd Schema Challenge data (Levesque et al, 2012) respectively into inference tasks. More recently, SciTail (Khot et al, 2018) and Adversarial NLI (Nie et al, 2019) have focused on building adversarial datasets; the former uses information retrieval to select adversarial premises, while the latter uses iterative annotation cycles to confuse models.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…(Rajpurkar et al, 2016) and Winograd Schema Challenge data (Levesque et al, 2012) respectively into inference tasks. More recently, SciTail (Khot et al, 2018) and Adversarial NLI (Nie et al, 2019) have focused on building adversarial datasets; the former uses information retrieval to select adversarial premises, while the latter uses iterative annotation cycles to confuse models.…”
Section: Resultsmentioning
confidence: 99%
“…It is imperative to produce datasets that allow for controlled study of artifacts. A popular strategy today is to use adversarial annotation (Zellers et al, 2018;Nie et al, 2019) and rewriting of the input (Chen et al, 2020). We argue that we can systematically construct test sets that can help study artifacts along specific dimensions.…”
Section: Resultsmentioning
confidence: 99%
“…The most recent English corpus Adversarial NLI (Nie et al, 2020) uses Human-And-Model-in-the-Loop Enabled Training (HAMLET) method for data collection. Their annotation method requires an existing NLI corpus to train the model during annotation, which is not possible for Chinese at the moment, as there exists no high-quality Chinese data.…”
Section: Related Workmentioning
confidence: 99%
“…There have been several recent attempts to reduce such biases (Belinkov et al, 2019;Sakaguchi et al, 2020;Nie et al, 2020). There has also been a large body of work using probing datasets/tasks to stress-test NLI models trained on datasets such as SNLI and MNLI, in order to expose the weaknesses and biases in either the models or the data (Dasgupta et al, 2018;Naik et al, 2018;McCoy et al, 2019).…”
Section: Biasesmentioning
confidence: 99%
“…NLI involves rich natural language understanding capabilities, many of which relate to world knowledge. To acquire such knowledge, researchers have found benefit from external knowledge bases like WordNet (Fellbaum, 1998), FrameNet (Baker, 2014), Wikidata (Vrandečić and Krötzsch, 2014), and large-scale human-annotated datasets (Bowman et al, 2015;Nie et al, 2020). Creating these resources generally requires expensive human annotation.…”
Section: Introductionmentioning
confidence: 99%