Adversarial NLI: A New Benchmark for Natural Language Understanding

Nie, Yixin; Williams, Adina; Dinan, Emily; Bansal, Mohit; Weston, Jason; Kiela, Douwe

doi:10.18653/v1/2020.acl-main.441

Cited by 462 publications

(553 citation statements)

References 41 publications

Supporting

Mentioning

470

Contrasting

Unclassified

Order By: Relevance

“…(Rajpurkar et al, 2016) and Winograd Schema Challenge data (Levesque et al, 2012) respectively into inference tasks. More recently, SciTail (Khot et al, 2018) and Adversarial NLI (Nie et al, 2019) have focused on building adversarial datasets; the former uses information retrieval to select adversarial premises, while the latter uses iterative annotation cycles to confuse models.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

INFOTABS: Inference on Tables as Semi-structured Data

Gupta¹,

Mehta²,

Nokhiz³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them. We argue that such data can prove as a testing ground for understanding how we reason about information. To study this, we introduce a new dataset called INFOTABS, comprising of human-written textual hypotheses based on premises that are tables extracted from Wikipedia info-boxes. Our analysis shows that the semi-structured, multi-domain and heterogeneous nature of the premises admits complex, multi-faceted reasoning. Experiments reveal that, while human annotators agree on the relationships between a table-hypothesis pair, several standard modeling strategies are unsuccessful at the task, suggesting that reasoning about tables can pose a difficult modeling challenge.

show abstract

Section: Resultsmentioning

confidence: 99%

“…It is imperative to produce datasets that allow for controlled study of artifacts. A popular strategy today is to use adversarial annotation (Zellers et al, 2018;Nie et al, 2019) and rewriting of the input (Chen et al, 2020). We argue that we can systematically construct test sets that can help study artifacts along specific dimensions.…”

Section: Resultsmentioning

confidence: 99%

INFOTABS: Inference on Tables as Semi-structured Data

Gupta¹,

Mehta²,

Nokhiz³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…The most recent English corpus Adversarial NLI (Nie et al, 2020) uses Human-And-Model-in-the-Loop Enabled Training (HAMLET) method for data collection. Their annotation method requires an existing NLI corpus to train the model during annotation, which is not possible for Chinese at the moment, as there exists no high-quality Chinese data.…”

Section: Related Workmentioning

confidence: 99%

“…There have been several recent attempts to reduce such biases (Belinkov et al, 2019;Sakaguchi et al, 2020;Nie et al, 2020). There has also been a large body of work using probing datasets/tasks to stress-test NLI models trained on datasets such as SNLI and MNLI, in order to expose the weaknesses and biases in either the models or the data (Dasgupta et al, 2018;Naik et al, 2018;McCoy et al, 2019).…”

Section: Biasesmentioning

confidence: 99%

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (∼12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. To the best of our knowledge, this is the first humanelicited MNLI-style corpus for a non-English language.

show abstract

“…NLI involves rich natural language understanding capabilities, many of which relate to world knowledge. To acquire such knowledge, researchers have found benefit from external knowledge bases like WordNet (Fellbaum, 1998), FrameNet (Baker, 2014), Wikidata (Vrandečić and Krötzsch, 2014), and large-scale human-annotated datasets (Bowman et al, 2015;Nie et al, 2020). Creating these resources generally requires expensive human annotation.…”

Section: Introductionmentioning

confidence: 99%

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Chen

Chu

Stratos

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WIKINLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baselines such as BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019) by pretraining them on WIKINLI and transferring the models on downstream tasks. We conduct systematic comparisons with phrases extracted from other knowledge bases such as WordNet and Wikidata to find that pretraining on WIKINLI gives the best performance. In addition, we construct WIKINLI in other languages, and show that pretraining on them improves performance on NLI tasks of corresponding languages. 1 * Equal contribution. Listed in alphabetical order. 1 Code and data are available at https://github. com/ZeweiChu/WikiNLI.

show abstract

Adversarial NLI: A New Benchmark for Natural Language Understanding

Cited by 462 publications

References 41 publications

INFOTABS: Inference on Tables as Semi-structured Data

INFOTABS: Inference on Tables as Semi-structured Data

OCNLI: Original Chinese Natural Language Inference

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Contact Info

Product

Resources

About