Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Belinkov, Yonatan; Poliak, Adam; Shieber, Stuart M.; Durme, Benjamin Van; Rushton, Gérard

doi:10.18653/v1/p19-1084

Cited by 52 publications

(77 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the wide impact that large-scale NLI datasets, such as SNLI and MNLI, have had on recent progress in NLU for English, we hope that our resource will likewise help accelerate progress on Chinese NLU. In addition to making more progress on Chinese NLI, future work will also focus on using our dataset for doing Chinese model probing (e.g., building on work such as Warstadt et al (2019); ; Jeretic et al (2020)) and sentence representation learning (Reimers and Gurevych, 2019), as well as for investigating bias-reduction techniques (Clark et al, 2019;Belinkov et al, 2019; for languages other than English.…”

Section: Discussionmentioning

confidence: 99%

“…There have been several recent attempts to reduce such biases (Belinkov et al, 2019;Sakaguchi et al, 2020;Nie et al, 2020). There has also been a large body of work using probing datasets/tasks to stress-test NLI models trained on datasets such as SNLI and MNLI, in order to expose the weaknesses and biases in either the models or the data (Dasgupta et al, 2018;Naik et al, 2018;McCoy et al, 2019).…”

Section: Biasesmentioning

confidence: 99%

“…Given the large literature on adversarial filtering and adversarial learning (Belinkov et al, 2019) for NLI, which have so far been limited to English and on much larger datasets that are easier to filter, we see extending these methods to our dataset and Chinese as an interesting challenge for future research. though XNLI contains 8 times more examples.…”

Section: How Difficult Is Ocnli?mentioning

confidence: 99%

See 2 more Smart Citations

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (∼12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. To the best of our knowledge, this is the first humanelicited MNLI-style corpus for a non-English language.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Biasesmentioning

confidence: 99%

Section: How Difficult Is Ocnli?mentioning

confidence: 99%

See 1 more Smart Citation

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…We show that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. This approach produces more robust NLI models, outperforming previous de-biasing efforts when generalised to 12 other NLI datasets (Belinkov et al, 2019a;Mahabadi et al, 2020). In addition, we find that the optimal number of adversarial classifiers depends on the dimensionality of the sentence representations, with larger sentence representations being more difficult to de-bias while benefiting from using a greater number of adversaries.…”

mentioning

confidence: 74%

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Stacey¹,

Minervini²,

Dubossarsky³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes. These artefacts are exploited by neural networks even when only considering the hypothesis and ignoring the premise, leading to unwanted biases. Belinkov et al. (2019b) proposed tackling this problem via adversarial training, but this can lead to learned sentence representations that still suffer from the same biases. We show that the bias can be reduced in the sentence representations by using an ensemble of adversaries, encouraging the model to jointly decrease the accuracy of these different adversaries while fitting the data. This approach produces more robust NLI models, outperforming previous de-biasing efforts when generalised to 12 other NLI datasets (Belinkov et al., 2019a;Mahabadi et al., 2020). In addition, we find that the optimal number of adversarial classifiers depends on the dimensionality of the sentence representations, with larger sentence representations being more difficult to de-bias while benefiting from using a greater number of adversaries.

show abstract

“…weighting affected training examples. They are often evaluated using adversarial or synthetic sets that contain counterexamples, in which relying on the examined bias will result in incorrect predictions (Belinkov et al, 2019;Clark et al, 2019;He et al, 2019;Mahabadi et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Wu¹,

Moosavi

Rücklé

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets. However, focusing on dataset-specific biases limits their ability to learn more generalizable knowledge about the task from more general data patterns. In this paper, we investigate the impact of debiasing methods for improving generalization and propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data. Our framework weights each example based on the biases it contains and the strength of those biases in the training data. It then uses these weights in the training objective so that the model relies less on examples with high bias weights. We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths. We perform the evaluations in two different settings, in which the model is trained on a single domain or multiple domains simultaneously, and show its effectiveness in both settings compared to state-of-the-art debiasing methods. 1

show abstract

Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Cited by 52 publications

References 45 publications

OCNLI: Original Chinese Natural Language Inference

OCNLI: Original Chinese Natural Language Inference

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Contact Info

Product

Resources

About