Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.543
|View full text |Cite
|
Sign up to set email alerts
|

Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language?

Abstract: Despite the success of language models using neural networks, it remains unclear to what extent neural models have the generalization ability to perform inferences. In this paper, we introduce a method for evaluating whether neural models can learn systematicity of monotonicity inference in natural language, namely, the regularity for performing arbitrary inferences with generalization on composition. We consider four aspects of monotonicity inferences and test whether the models can systematically interpret l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(52 citation statements)
references
References 31 publications
(26 reference statements)
1
48
0
Order By: Relevance
“…Defining disjoint train/test splits is enough to foil truly unsystematic models (e.g., simple look-up tables). However, building on much previous work (Lake and Baroni, 2018;Hupkes et al, 2019;Yanaka et al, 2020;Bahdanau et al, 2018;Goodwin et al, 2020;Geiger et al, 2019), we contend that a randomly constructed disjoint train/test split only diag-noses the most basic level of systematicity. More difficult systematic generalization tasks will only be solved by models exhibiting more complex compositional structures.…”
Section: A Systematic Generalization Taskmentioning
confidence: 92%
See 1 more Smart Citation
“…Defining disjoint train/test splits is enough to foil truly unsystematic models (e.g., simple look-up tables). However, building on much previous work (Lake and Baroni, 2018;Hupkes et al, 2019;Yanaka et al, 2020;Bahdanau et al, 2018;Goodwin et al, 2020;Geiger et al, 2019), we contend that a randomly constructed disjoint train/test split only diag-noses the most basic level of systematicity. More difficult systematic generalization tasks will only be solved by models exhibiting more complex compositional structures.…”
Section: A Systematic Generalization Taskmentioning
confidence: 92%
“…There are often strong intuitions that certain generalization tasks are only solved by models with systematic structures. These tasks are referred to as systematic generalization tasks (Lake and Baroni, 2018;Hupkes et al, 2019;Yanaka et al, 2020;Bahdanau et al, 2018;Geiger et al, 2019;Goodwin et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…The GLUE and SuperGlue datasets include diagnostic sets where annotators manually labeled samples of examples as requiring a broad range of linguistic phenomena. The types of phenomena manu-Proto-Roles (White et al, 2017), Paraphrastic Inference (White et al, 2017, Event Factuality (Poliak et al, 2018b;Staliūnaitė, 2018), Anaphora Resolution (White et al, 2017Poliak et al, 2018b), Lexicosyntactic Inference (Pavlick and Callison-Burch, 2016;Poliak et al, 2018b;Glockner et al, 2018), Compositionality (Dasgupta et al, 2018), Prepositions (Kim et al, 2019), Comparatives (Kim et al, 2019;Richardson et al, 2020), Quantification/Numerical Reasoning (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Spatial Expressions (Kim et al, 2019), Negation (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Tense & Aspect (Kober et al, 2019), Veridicality (Poliak et al, 2018b;, Monotonicity (Yanaka et al, 2019(Yanaka et al, , 2020Richardson et al, 2020), Presupposition (Jeretic et al, 2020), Implicatures (Jeretic et al, 2020), Temporal Reasoning (Vashishtha et al, 2020) ally labeled include lexical semantics, predicateargument structure, logic, and common sense or world knowledge. 14…”
Section: Manually Createdmentioning
confidence: 99%
“…In contrast to Dagan et al (2005), the task definitions were short and loose, relying on the annotators' common sense understanding. Many works since have been using the NLI framework and the crowdsourcing procedure associated with it to test models for different language phenomena (Marelli et al, 2014;Lai et al, 2017;Naik et al, 2018;Ross and Pavlick, 2019;Yanaka et al, 2020).…”
Section: Started a New Book I Bought Last Weekmentioning
confidence: 99%
“…In light of the low agreements on explicit modeling of the task of complement coercion, we turn to a different crowdsourcing approach which was proven successful for many linguistic phenomena -using NLI as discussed above ( §2). NLI was used to collect data for a wide range of linguistic phenomena: Paraphrase Inference, Anaphora Resolution, Numerical Reasoning, Implicatures and more (White et al, 2017;Poliak et al, 2018;Jeretic et al, 2020;Yanaka et al, 2020;Naik et al, 2018) (see Poliak (2020)). Therefore, we take a similar approach, with similar methodologies, and make use of NLI as an evaluation setup for the complement coercion phenomenon.…”
Section: Nli For Complement Coercionmentioning
confidence: 99%