2020
DOI: 10.48550/arxiv.2007.02931
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adaptive Risk Minimization: Learning to Adapt to Domain Shift

Abstract: A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested on data that are structurally different from the training set, either due to temporal correlations, particular end users, or other factors. In this work, we consider the setting where test examples are not drawn from the training distribution. Prior… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 59 publications
0
15
0
Order By: Relevance
“…Note that we provide the performance of our SelfReg applied with SWA [23] technique also. [38] 86.6 ± 0.8 81.8 ± 0.9 97.1 ± 0.5 82.7 ± 0.6 87.1 SelfReg (ours) † 87.9 ± 0.5 80.6 ± 1.1 97.1 ± 0.4 81.1 ± 1.3 86.7 ± 0.8 ERM [41] 86.5 ± 1.0 81.3 ± 0.6 96.2 ± 0.3 82.7 ± 1.1 86.7 RSC [22] 86.0 ± 0.7 81.8 ± 0.9 96.8 ± 0.7 80.4 ± 0.5 86.2 ARM [47] 85.0 ± 1.2 81.4 ± 0.2 95.9 ± 0.3 80.9 ± 0.5 85.8 VREx [25] 87.8 ± 1.2 81.8 ± 0.7 97.4 ± 0.2 82.1 ± 0.7 87.2 MLDG [27] 87.0 ± 1.2 82.5 ± 0.9 96.7 ± 0.3 81.2 ± 0.6 86.8 MMD [28] 88.1 ± 0.8 82.6 ± 0.7 97.1 ± 0.5 81.2 ± 1.2 87.2 Mixup [45] 87.5 ± 0.4 81.6 ± 0.7 97.4 ± 0.2 80.8 ± 0.9 86.8 MTL [3] 87.0 ± 0.2 82.7 ± 0.8 96.5 ± 0.7 80.5 ± 0.8 86.7 GroupDRO [36] 87.5 ± 0.5 82.9 ± 0.6 97.1 ± 0.3 81.1 ± 1.2 87.1 DANN [15] 87.0 ± 0.4 80.3 ± 0.6 96.8 ± 0.3 76.9 ± 1.1 85.2 IRM [1] 84.2 ± 0.9 79.7 ± 1.5 95.9 ± 0.4 78.3 ± 2.1 84.5 CDANN [29] 87.7 ± 0.6 80.7 ± 1.2 97.3 ± 0.4 77.6 ± 1.5 85.8…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that we provide the performance of our SelfReg applied with SWA [23] technique also. [38] 86.6 ± 0.8 81.8 ± 0.9 97.1 ± 0.5 82.7 ± 0.6 87.1 SelfReg (ours) † 87.9 ± 0.5 80.6 ± 1.1 97.1 ± 0.4 81.1 ± 1.3 86.7 ± 0.8 ERM [41] 86.5 ± 1.0 81.3 ± 0.6 96.2 ± 0.3 82.7 ± 1.1 86.7 RSC [22] 86.0 ± 0.7 81.8 ± 0.9 96.8 ± 0.7 80.4 ± 0.5 86.2 ARM [47] 85.0 ± 1.2 81.4 ± 0.2 95.9 ± 0.3 80.9 ± 0.5 85.8 VREx [25] 87.8 ± 1.2 81.8 ± 0.7 97.4 ± 0.2 82.1 ± 0.7 87.2 MLDG [27] 87.0 ± 1.2 82.5 ± 0.9 96.7 ± 0.3 81.2 ± 0.6 86.8 MMD [28] 88.1 ± 0.8 82.6 ± 0.7 97.1 ± 0.5 81.2 ± 1.2 87.2 Mixup [45] 87.5 ± 0.4 81.6 ± 0.7 97.4 ± 0.2 80.8 ± 0.9 86.8 MTL [3] 87.0 ± 0.2 82.7 ± 0.8 96.5 ± 0.7 80.5 ± 0.8 86.7 GroupDRO [36] 87.5 ± 0.5 82.9 ± 0.6 97.1 ± 0.3 81.1 ± 1.2 87.1 DANN [15] 87.0 ± 0.4 80.3 ± 0.6 96.8 ± 0.3 76.9 ± 1.1 85.2 IRM [1] 84.2 ± 0.9 79.7 ± 1.5 95.9 ± 0.4 78.3 ± 2.1 84.5 CDANN [29] 87.7 ± 0.6 80.7 ± 1.2 97.3 ± 0.4 77.6 ± 1.5 85.8…”
Section: Discussionmentioning
confidence: 99%
“…ColoredMNIST [1], RotatedMNIST [16], VLCS [13], PACS [26], Office-Home [42], and TerraIncognita [2], DomainNet [34]) and provides benchmarks results of 14 baseline approaches (i.e. ERM [41], IRM [1], GroupDRO [36], Mixup [45], MLDG [27], CORAL [38], MMD [28], DANN [15], CDANN [29], MTL [3], SagNet [32], ARM [47], VREx [25], RSC [22]).…”
Section: Experiments On Domainbedmentioning
confidence: 99%
“…As [17,39] point, diverse meta-learning methods can be categorized into three groups: (1) Metric based methods [35,30,32] perform non-parametric learning in the metric space, which are far largely restricted to the popular. (2) Optimization based methods [2,6,17,45] use gradient descent to solve the optimization problem of meata-learner. A most famous example is MAML [6], which learns the transferable initial parameters, such that few gradient updates lead to performance improvement.…”
Section: Related Workmentioning
confidence: 99%
“…A most famous example is MAML [6], which learns the transferable initial parameters, such that few gradient updates lead to performance improvement. Recently, [45] proposed Adaptive Risk Minimization to handle group distribution shift for im-age classification. (3) Network based methods [28,23,24] use network to learn across task knowledges and rapidly updates its parameters to new task.…”
Section: Related Workmentioning
confidence: 99%
“…Deep neural networks have achieved impressive performance by minimizing the average loss on training datasets. Although we typically adopt the empirical risk minimization framework as a training objective, it is sometimes problematic due to dataset bias leading to significant degradation of worse-case generalization performance as discussed in [2,35,17,11,36]. This is because models do not always learn what we expect, but, to the contrary, rather capture unintended decision rules with spurious correlations.…”
Section: Introductionmentioning
confidence: 99%