Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.731
|View full text |Cite
|
Sign up to set email alerts
|

COGS: A Compositional Generalization Challenge Based on Semantic Interpretation

Abstract: Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
207
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 128 publications
(209 citation statements)
references
References 37 publications
2
207
0
Order By: Relevance
“…Whether our models have learned to solve tasks in robust and generalizable ways has been a topic of much recent interest. Challenging test sets have shown that many state-of-the-art NLP models struggle with compositionality Kim and Linzen, 2020;Yu and Ettinger, 2020;White et al, 2020), and find it difficult to pass the myriad stress tests for social May et al, 2019;Nangia et al, 2020) and/or linguistic competencies (Geiger et al, 2018;Naik et al, 2018;Glockner et al, 2018;White et al, 2018;Warstadt et al, 2019;Gauthier et al, 2020;Hossain et al, 2020;Jeretic et al, 2020;Lewis et al, 2020;Saha et al, 2020;Schuster et al, 2020;Sugawara et al, 2020;. Yet, challenge sets may suffer from performance instability (Liu et al, 2019a;Rozen et al, 2019; and often lack sufficient statistical power (Card et al, 2020), suggesting that, although they may be valuable assessment tools, they are not sufficient for ensuring that our models have achieved the learning targets we set for them.…”
Section: Challenge Sets and Adversarial Settingsmentioning
confidence: 99%
“…Whether our models have learned to solve tasks in robust and generalizable ways has been a topic of much recent interest. Challenging test sets have shown that many state-of-the-art NLP models struggle with compositionality Kim and Linzen, 2020;Yu and Ettinger, 2020;White et al, 2020), and find it difficult to pass the myriad stress tests for social May et al, 2019;Nangia et al, 2020) and/or linguistic competencies (Geiger et al, 2018;Naik et al, 2018;Glockner et al, 2018;White et al, 2018;Warstadt et al, 2019;Gauthier et al, 2020;Hossain et al, 2020;Jeretic et al, 2020;Lewis et al, 2020;Saha et al, 2020;Schuster et al, 2020;Sugawara et al, 2020;. Yet, challenge sets may suffer from performance instability (Liu et al, 2019a;Rozen et al, 2019; and often lack sufficient statistical power (Card et al, 2020), suggesting that, although they may be valuable assessment tools, they are not sufficient for ensuring that our models have achieved the learning targets we set for them.…”
Section: Challenge Sets and Adversarial Settingsmentioning
confidence: 99%
“…The task is framed as a sequence generation task. We use the recently proposed (Kim and Linzen, 2020) dataset.…”
Section: Results On Compositional Generalization Challenge and Semantic Parsingmentioning
confidence: 99%
“…However, this is unlikely to produce human-like learning and generalisation, particularly in terms of extrapolation beyond the training distribution. For example, Kim and Linzen (2020) find that neural models of semantic parsing struggle to generalise from shallower -e.g. Ava saw the ball in the bottle on the table -to more deeply nested structures -e.g.…”
Section: Discussionmentioning
confidence: 99%