Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1456
|View full text |Cite
|
Sign up to set email alerts
|

Posing Fair Generalization Tasks for Natural Language Inference

Abstract: Deep learning models for semantics are generally evaluated using naturalistic corpora. Adversarial methods, in which models are evaluated on new examples with known semantic properties, have begun to reveal that good performance at these naturalistic tasks can hide serious shortcomings. However, we should insist that these evaluations be fair -that the models are given data sufficient to support the requisite kinds of generalization. In this paper, we define and motivate a formal notion of fairness in this sen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
42
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 40 publications
(43 citation statements)
references
References 17 publications
0
42
0
Order By: Relevance
“…Defining disjoint train/test splits is enough to foil truly unsystematic models (e.g., simple look-up tables). However, building on much previous work (Lake and Baroni, 2018;Hupkes et al, 2019;Yanaka et al, 2020;Bahdanau et al, 2018;Goodwin et al, 2020;Geiger et al, 2019), we contend that a randomly constructed disjoint train/test split only diag-noses the most basic level of systematicity. More difficult systematic generalization tasks will only be solved by models exhibiting more complex compositional structures.…”
Section: A Systematic Generalization Taskmentioning
confidence: 92%
See 3 more Smart Citations
“…Defining disjoint train/test splits is enough to foil truly unsystematic models (e.g., simple look-up tables). However, building on much previous work (Lake and Baroni, 2018;Hupkes et al, 2019;Yanaka et al, 2020;Bahdanau et al, 2018;Goodwin et al, 2020;Geiger et al, 2019), we contend that a randomly constructed disjoint train/test split only diag-noses the most basic level of systematicity. More difficult systematic generalization tasks will only be solved by models exhibiting more complex compositional structures.…”
Section: A Systematic Generalization Taskmentioning
confidence: 92%
“…There is an extensive literature on monotonicity logics (Moss, 2009;Icard, 2012;Icard and Moss, 2013;. Within NLP, MacCartney andManning (2008, 2009) apply very rich monotonicity algebras to NLI problems, Hu et al (2019a,b) create NLI models that use polarity-marked parse trees, and Yanaka et al (2019a,b) and Geiger et al (2019) investigate the ability of neural models to understand natural logic reasoning. While we consider only a small fragment of these approaches, the methods we develop should apply to more complex systems as well.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Salvatore et al (2019) use synthetic data generated from logical forms to evaluate the performance of textual entailment models (e.g., BERT). Geiger et al (2019) use synthetic data to create fair evaluation sets for natural language inference. Geva et al (2020) show the importance of injecting numerical reasoning via generated data into the model to solve reading comprehension tasks.…”
Section: Counterfactual Data Generationmentioning
confidence: 99%