Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Bhargava, Prajjwal; Drozd, Aleksandr; Rogers, Anna

doi:10.18653/v1/2021.insights-1.18

Cited by 39 publications

(19 citation statements)

References 23 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The only difference in the procedure is that we take the representation of the [CLS] token to be the embedding of the sentence and omit the MLP. We study one architecture, BERT-Small (Bhargava et al, 2021;Turc et al, 2019), which is a BERT architecture with 4 hidden layers (Devlin et al, 2018).…”

Section: Methodsmentioning

confidence: 99%

Break It Down: Evidence for Structural Compositionality in Neural Networks

Lepori¹,

Serre²,

Pavlick³

2023

Preprint

View full text Add to dashboard Cite

Many tasks can be described as compositions over subroutines. Though modern neural networks have achieved impressive performance on both vision and language tasks, we know little about the functions that they implement. One possibility is that neural networks implicitly break down complex tasks into subroutines, implement modular solutions to these subroutines, and compose them into an overall solution to a task -a property we term structural compositionality. Or they may simply learn to match new inputs to memorized representations, eliding task decomposition entirely. Here, we leverage model pruning techniques to investigate this question in both vision and language, across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models oftentimes implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subroutines. This suggests that neural networks may be able to learn to exhibit compositionality, obviating the need for specialized symbolic mechanisms.

show abstract

Section: Methodsmentioning

confidence: 99%

Break It Down: Evidence for Structural Compositionality in Neural Networks

Lepori¹,

Serre²,

Pavlick³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Methods under this category do not directly alter the training dataset, but instead resort to changes in the modeling technique -these changes can be in terms of the optimization function, regularization, additional auxiliary costs, etc. The main idea in DB is to utilize known biases (or identify unknown biases) in the data distribution, model these biases in the training pipeline, and use this knowledge to train robust classifiers (Clark et al, 2019;Bhargava et al, 2021). In the image classification literature, there is growing consensus on enforcing a consistency on different views (or augmentations) of an image in order to achieve debiasing (Hendrycks et al, 2020c;Xu et al, 2020;Chai et al, 2021;Nam et al, 2021).…”

Section: Categorization Of Domain Generalization Methodsmentioning

confidence: 99%

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Gokhale¹,

Mishra²,

Luo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to outof-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a twodimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR. However, data filtering (previously shown to improve OOD accuracy on natural language inference) hurts OOD accuracy on other tasks such as question answering and image classification. We provide insights from our experiments to inform future work in this direction.

show abstract

“…The third category on our source of shift axis concerns the case in which one data partition (usually the training set) is a fully natural corpus, but the other partition is designed with specific properties in mind, to address a generalisation aspect of interest. Data in the constructed partition may avoid or contain specific (syntactic) patterns (Bhargava et al, 2021;Cui et al, 2022), violate heuristics about gender (Dayanik and Padó, 2021;Libovický et al, 2022), or include unusually long or complex sequences (Lakretz et al, 2021a;Raunak et al, 2019). As an example of this shift source, Dankers et al (2022) or automatically using a specific model (e.g.…”

Section: Generated Shiftsmentioning

confidence: 99%

State-of-the-art generalisation research in NLP: A taxonomy and review

Hupkes¹,

Giulianelli²,

Dankers³

et al. 2022

Preprint

View full text Add to dashboard Cite

The ability to generalise well is one of the primary desiderata of natural language processing (NLP). Yet, what 'good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the groundwork to improve both of these issues. We present a taxonomy for characterising and understanding generalisation research in NLP, we use that taxonomy to present a comprehensive map of published generalisation studies, and we make recommendations for which areas might deserve attention in the future. Our taxonomy is based on an extensive literature review of generalisation research, and contains five axes along which studies can differ: their main motivation, the type of generalisation they aim to solve, the type of data shift they consider, the source by which this data shift is obtained, and the locus of the shift within the modelling pipeline. We use our taxonomy to classify over 400 previous papers that test generalisation, for a total of more than 600 individual experiments. Considering the results of this review, we present an in-depth analysis of the current state of generalisation research in NLP, and make recommendations for the future. Along with this paper, we release a webpage where the results of our review can be dynamically explored, and which we intend to update as new NLP generalisation studies are published. With this work, we aim to make steps towards making state-of-the-art generalisation testing the new status quo in NLP.

show abstract

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Cited by 39 publications

References 23 publications

Break It Down: Evidence for Structural Compositionality in Neural Networks

Break It Down: Evidence for Structural Compositionality in Neural Networks

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

State-of-the-art generalisation research in NLP: A taxonomy and review

Contact Info

Product

Resources

About