Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1204
|View full text |Cite
|
Sign up to set email alerts
|

Most "babies" are "little" and most "problems" are "huge": Compositional Entailment in Adjective-Nouns

Abstract: We examine adjective-noun (AN) composition in the task of recognizing textual entailment (RTE). We analyze behavior of ANs in large corpora and show that, despite conventional wisdom, adjectives do not always restrict the denotation of the nouns they modify. We use natural logic to characterize the variety of entailment relations that can result from AN composition. Predicting these relations depends on context and on commonsense knowledge, making AN composition especially challenging for current RTE systems. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
42
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 34 publications
(42 citation statements)
references
References 18 publications
0
42
0
Order By: Relevance
“…As target datasets, we use the 10 datasets investigated by Poliak et al (2018b) in their hypothesisonly study, plus two test sets: GLUE's diagnostic test set, which was carefully constructed to not contain hypothesis-biases (Wang et al, 2018), and SNLI-hard, a subset of the SNLI test set that is thought to have fewer biases (Gururangan et al, 2018). The target datasets include humanjudged datasets that used automatic methods to pair premises and hypotheses, and then relied on humans to label the pairs: SCITAIL , ADD-ONE-RTE (Pavlick & Callison-Burch, 2016), Johns Hopkins Ordinal Common-α β 0.1 0.25 0.5 1 2.5 5 0.1 50 2.5 50 50 100 75 100 100 3 50 100 100 100 100 100 3.5 100 100 100 100 100 100 4 100 100 100 100 100 100 5 100 100 100 100 100 100 10 100 100 100 100 100 100 20 100 100 100 100 100 100 Reisinger et al, 2015). 9 As many of these datasets have different label spaces than SNLI, we define a mapping (Appendix A.1) from our models' predictions to each target dataset's labels.…”
Section: Methodsmentioning
confidence: 99%
“…As target datasets, we use the 10 datasets investigated by Poliak et al (2018b) in their hypothesisonly study, plus two test sets: GLUE's diagnostic test set, which was carefully constructed to not contain hypothesis-biases (Wang et al, 2018), and SNLI-hard, a subset of the SNLI test set that is thought to have fewer biases (Gururangan et al, 2018). The target datasets include humanjudged datasets that used automatic methods to pair premises and hypotheses, and then relied on humans to label the pairs: SCITAIL , ADD-ONE-RTE (Pavlick & Callison-Burch, 2016), Johns Hopkins Ordinal Common-α β 0.1 0.25 0.5 1 2.5 5 0.1 50 2.5 50 50 100 75 100 100 3 50 100 100 100 100 100 3.5 100 100 100 100 100 100 4 100 100 100 100 100 100 5 100 100 100 100 100 100 10 100 100 100 100 100 100 20 100 100 100 100 100 100 Reisinger et al, 2015). 9 As many of these datasets have different label spaces than SNLI, we define a mapping (Appendix A.1) from our models' predictions to each target dataset's labels.…”
Section: Methodsmentioning
confidence: 99%
“…Although NLI is generally cast in informal terms that embrace the indeterminacy of such reasoning, the task nonetheless manifests a number of very predictable reasoning patterns. For example, systematic manipulations of the lexical meanings (Glockner et al, 2018), syntactic constructions (Nie et al, 2019a), and contextual assumptions (Pavlick and Callison-Burch, 2016) have systematic effects on the correct labels. These patterns present crisp, motivated learning targets that we can leverage to not only evaluate the ability of NLI models to learn robust solutions, but also to analyze the internal dynamics of successful models.…”
Section: Introductionmentioning
confidence: 99%
“…Past work has also evaluated commonsense capabilities in neural models. Pavlick and Callison-Burch (2016) investigate the related problem of entailment in adjective-nouns, and show surprising negative results for neural NLI models. Wang et al (2018) showed that models based on distributional semantics without explicit external knowledge perform poorly at predicting physical plausibility of actions.…”
Section: Related Workmentioning
confidence: 99%