2019
DOI: 10.48550/arxiv.1911.08731
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d.test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
309
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 171 publications
(314 citation statements)
references
References 39 publications
5
309
0
Order By: Relevance
“…Empirically, we conduct studies on a set of challenging synthetic linear benchmarks designed by (Aubin et al, 2021) and a set of real-world datasets (two image datasets and one text dataset) used in Sagawa et al (2019). Our empirical results on the synthetic benchmarks validate the claimed environment complexities, and also demonstrate its superior performance when compared with IRM and its variant.…”
Section: Introductionmentioning
confidence: 68%
See 4 more Smart Citations
“…Empirically, we conduct studies on a set of challenging synthetic linear benchmarks designed by (Aubin et al, 2021) and a set of real-world datasets (two image datasets and one text dataset) used in Sagawa et al (2019). Our empirical results on the synthetic benchmarks validate the claimed environment complexities, and also demonstrate its superior performance when compared with IRM and its variant.…”
Section: Introductionmentioning
confidence: 68%
“…Since the real-world data are highly complex and non-linear, over which the ISR approach cannot be directly applied, we apply ISR on top of the features extracted by the hidden layers of trained neural nets as a post-processing procedure. Experiments show that ISR-Mean can consistently increase the worse-case accuracy of the trained models against spurious correlations and group shifts, and this includes models trained by ERM, reweighting and GroupDRO (Sagawa et al, 2019).…”
Section: Introductionmentioning
confidence: 91%
See 3 more Smart Citations