Compositional Generalization for Primitive Substitutions

Li, Yuanpeng; Zhao, Liang; Wang, Yunlong; Hestness, Joel

doi:10.18653/v1/d19-1438

Cited by 49 publications

(83 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The table shows results (mean test accuracy (%) ± standard deviation) on the test splits of the dataset. Syntactic Attention is compared to the previous models, which were a CNN (Dessì and Baroni, 2019), GRUs augmented with an attention mechanism ("+ attn"), which either included or did not include a dependency ("-dep") in the decoder on the previous action (Bastings et al, 2018), and the recent model of Li et al (2019). Lake ( 2019) showed that a meta-learning architecture using an external memory achieves 99.95% accuracy on a meta-seq2seq version of the SCAN task.…”

Section: Compositional Generalization Resultsmentioning

confidence: 99%

Compositional Generalization by Factorizing Alignment and Translation

Russin

Jo²,

O’Reilly

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in cognitive science suggesting a functional distinction between systems for syntactic and semantic processing, we implement a modification to an existing approach in neural machine translation, imposing an analogous separation between alignment and translation. The resulting architecture substantially outperforms standard recurrent networks on the SCAN dataset, a compositional generalization task, without any additional supervision. Our work suggests that learning to align and to translate in separate modules may be a useful heuristic for capturing compositional structure.

show abstract

Section: Compositional Generalization Resultsmentioning

confidence: 99%

Compositional Generalization by Factorizing Alignment and Translation

Russin

Jo²,

O’Reilly

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

show abstract

“…One illustrative example is the poor performance of LSTMs on a SCAN split that requires generalizing from shorter to longer sequences. While several models have made significant improvements over other SCAN splits, progress on the length split remains minimal (Li et al, 2019;Gordon et al, 2020).…”

Section: Comparison To Related Workmentioning

confidence: 99%

“…Crucially, their training/evaluation split required compositional generalization. A number of models have been developed that have improved performance on SCAN (Li et al, 2019;Gordon et al, 2020). However, since the semantic representation used by SCAN only covers a small subset of English grammar, SCAN does not enable testing various systematic linguistic abstractions that humans are known to make (e.g., verb argument structure alternation).…”

Section: Introductionmentioning

confidence: 99%

COGS: A Compositional Generalization Challenge Based on Semantic Interpretation

Kim¹,

Linzen²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

128

138

View full text Add to dashboard Cite

Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96-99%), but generalization accuracy was substantially lower (16-35%) and showed high sensitivity to random seed (±6-8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.

show abstract

“…Many past works in the rich body of literature about analyzing NNs focus on compositional structure (Hupkes et al, 2020(Hupkes et al, , 2018Hewitt and Manning, 2019;Li et al, 2019) and systematicity (Lake and Baroni, 2018;Goodwin et al, 2020). Two of the most popular analysis techniques are the behavioral and probing approaches.…”

Section: Analysis Of Nnsmentioning

confidence: 99%

Discovering the Compositional Structure of Vector Representations with Role Learning Networks

Soulos¹,

McCoy²,

Linzen³

et al. 2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

How can neural networks perform so well on compositional tasks even though they lack explicit compositional representations? We use a novel analysis technique called ROLE to show that recurrent neural networks perform well on such tasks by converging to solutions which implicitly represent symbolic structure. This method uncovers a symbolic structure which, when properly embedded in vector space, closely approximates the encodings of a standard seq2seq network trained to perform the compositional SCAN task. We verify the causal importance of the discovered symbolic structure by showing that, when we systematically manipulate hidden embeddings based on this symbolic structure, the model's output is changed in the way predicted by our analysis. Goal: Interpret neural network encodings jump and run twice JUMP RUN RUN RNN Decoder RNN Encoder Encoding jump and run left twice JUMP LTURN RUN LTURN RUNMethod: Approximate the encodings of a neural network with a more interpretable compositional model ( §4)Step 1: Assign structural roles to words using a learned role assigner.Step 2: Combine word and role vectors using a closed-form equation with learned parameters.

show abstract

Compositional Generalization for Primitive Substitutions

Cited by 49 publications

References 28 publications

Compositional Generalization by Factorizing Alignment and Translation

Compositional Generalization by Factorizing Alignment and Translation

COGS: A Compositional Generalization Challenge Based on Semantic Interpretation

Discovering the Compositional Structure of Vector Representations with Role Learning Networks

Contact Info

Product

Resources

About