Out-of-Distribution Generalization via Risk Extrapolation (REx)

Krueger, David; Caballero, Ethan; Jacobsen, Joern-Henrik; Zhang, Amy; Binas, Jonathan; Zhang, Dinghuai; Priol, Remi Le; Courville, Aaron

doi:10.48550/arxiv.2003.00688

Cited by 59 publications

(123 citation statements)

References 22 publications

(42 reference statements)

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…A plethora of algorithms are proposed: learning invariant representation across domains [7,21,38,20], minimizing the weighted combination of risks from training domains [35], using different risk penalty terms to facilitate invariance prediction [1,17], causal inference approaches [31], and forcing the learned representation different from a set of pre-defined biased representations [2], mixup-based approaches [48,41,26], etc. A recent study [10] shows that no domain generalization methods achieve superior performance than ERM across a broad range of datasets.…”

Section: Discussion and Related Workmentioning

confidence: 99%

“…Extension: Empirical Validation of Theoretical Analysis. To further validate our analysis above, we comprehensively evaluate the OOD detection performance of models that are trained with recent prominent domain invariance learning objectives [1,2,17,7,21,35] (Section E in Appendix). The results align with our theoretical analysis.…”

Section: Ood Typementioning

confidence: 99%

“…Follow-up works proposed several variations, based on different notions of invariance. In particular, [17] proposed Risk Extrapolation (REx), which aims to achieve stronger invariance p(y|Φ(x)) by penalizing the variance of risks of environments. Other approaches have proposed to remove the predictability of p(e|Φ(x)) through domain adversarial losses such as DANN [7] and CDANN [21] (adapted for domain generalization).…”

Section: E Extension: Training With Domain Invariance Objectivesmentioning

confidence: 99%

“…In-distribution Classification Performance. 99.98 ± 0.02 100.00 ± 0.00 99.99 ± 0.02 GDRO [35] 99.97 ± 0.04 99.98 ± 0.03 99.98 ± 0.02 REx [17] 100.00 ± 0.00 99.99 ± 0.02 99.99 ± 0.02 DANN [7] 99.97 ± 0.02 99.99 ± 0.02 99.99 ± 0.02 CDANN [21] 99.97 ± 0.02 99.99 ± 0.02 99.98 ± 0.02 95.97 ± 0.62 GDRO [35] 95.74 ± 0.54 REx [17] 95.49 ± 0.77 DANN [7] 96.27 ± 0.25 CDANN [21] 94.74 ± 0.63…”

Section: F Experiments Details and In-distribution Classification Per...mentioning

confidence: 99%

See 3 more Smart Citations

On the Impact of Spurious Correlation for Out-of-distribution Detection

Yifei¹,

Yin²,

Li³

2021

Preprint

View full text Add to dashboard Cite

Modern neural networks can assign high confidence to inputs drawn from outside the training distribution, posing threats to models in real-world deployments. While much research attention has been placed on designing new out-of-distribution (OOD) detection methods, the precise definition of OOD is often left in vagueness and falls short of the desired notion of OOD in reality. In this paper, we present a new formalization and model the data shifts by taking into account both the invariant and environmental (spurious) features. Under such formalization, we systematically investigate how spurious correlation in the training set impacts OOD detection. Our results suggest that the detection performance is severely worsened when the correlation between spurious features and labels is increased in the training set. We further show insights on detection methods that are more effective in reducing the impact of spurious correlation, and provide theoretical analysis on why reliance on environmental features leads to high OOD detection error. Our work aims to facilitate better understandings of OOD samples and their formalization, as well as the exploration of methods that enhance OOD detection 1 .

show abstract

Section: Discussion and Related Workmentioning

confidence: 99%

Section: Ood Typementioning

confidence: 99%

Section: E Extension: Training With Domain Invariance Objectivesmentioning

confidence: 99%

Section: F Experiments Details and In-distribution Classification Per...mentioning

confidence: 99%

See 2 more Smart Citations

On the Impact of Spurious Correlation for Out-of-distribution Detection

Yifei¹,

Yin²,

Li³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In an ideal case, even simple k-NN based method can perform well. However, when the variation among class-conditionals of the same class is large, i.e., the closest conditional distribution to T (X|Y = y) is some S(X|Y = y ) of class y = y (Figure 1 Right), aforementioned methods may not perform well (Krueger et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Class-conditioned Domain Generalization via Wasserstein Distributional Robust Optimization

Wang,

Li,

Xie

et al. 2021

Preprint

View full text Add to dashboard Cite

Given multiple source domains, domain generalization aims at learning a universal model that performs well on any unseen but related target domain. In this work, we focus on the domain generalization scenario where domain shifts occur among class-conditional distributions of different domains. Existing approaches are not sufficiently robust when the variation of conditional distributions given the same class is large. In this work, we extend the concept of distributional robust optimization to solve the class-conditional domain generalization problem. Our approach optimizes the worst-case performance of a classifier over classconditional distributions within a Wasserstein ball centered around the barycenter of the source conditional distributions. We also propose an iterative algorithm for learning the optimal radius of the Wasserstein balls automatically. Experiments show that the proposed framework has better performance on unseen target domain than approaches without domain generalization.

show abstract

Data-driven subgrid-scale modeling of forced Burgers turbulence using deep learning with generalization to higher Reynolds numbers via transfer learning

et al. 2021

View full text Add to dashboard Cite

Developing data-driven subgrid-scale (SGS) models for large eddy simulations (LES) has received substantial attention recently. Despite some success, particularly in a priori (offline) tests, challenges have been identified that include numerical instabilities in a posteriori (online) tests and generalization (i.e., extrapolation) of trained data-driven SGS models, for example to higher Reynolds numbers. Here, using the stochastically forced Burgers turbulence as the test-bed, we show that deep neural networks trained using properly pre-conditioned (augmented) data yield stable and accurate a posteriori LES models. Furthermore, we show that transfer learning enables accurate/stable generalization to a flow with 10× higher Reynolds number.

show abstract

Out-of-Distribution Generalization via Risk Extrapolation (REx)

Cited by 59 publications

References 22 publications

On the Impact of Spurious Correlation for Out-of-distribution Detection

On the Impact of Spurious Correlation for Out-of-distribution Detection

Class-conditioned Domain Generalization via Wasserstein Distributional Robust Optimization

Data-driven subgrid-scale modeling of forced Burgers turbulence using deep learning with generalization to higher Reynolds numbers via transfer learning

Contact Info

Product

Resources

About