On Robustness and Transferability of Convolutional Neural Networks

Djolonga, Josip; Yung, Jessica; Tschannen, Michael; Romijnders, Rob; Beyer, Lucas; Kolesnikov, A. I.; Puigcerver, Joan; Minderer, Matthias; D’Amour, Alexander; Moldovan, Dan; Gelly, Sylvain; Houlsby, Neil; Zhai, Xiaohua; Lučić, Mario

doi:10.48550/arxiv.2007.08558

Cited by 12 publications

(17 citation statements)

References 46 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for CNNs, Djolonga et al (2020) evaluate the impact of the model size and dataset size on robustness, where classes at train and test time are the same, but there is a distribution shift in the data, for example changes in the lighting of image samples. They find that scaling both model size and training set size improves such robustness.…”

Section: Related Workmentioning

confidence: 99%

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Prato¹,

Guiroy²,

Caballero³

et al. 2021

Preprint

View full text Add to dashboard Cite

Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization.

show abstract

Section: Related Workmentioning

confidence: 99%

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Prato¹,

Guiroy²,

Caballero³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…where L q + denotes the subspace of almost everywhere non-negative functions of L q for (1/p) + (1/q) = 1. Here the Lagrangian L PI (θ, t, λ) is defined as L PI (θ, t, λ) = E (x,y)∼D t(x, y) + λ(x, δ, y) f θ (x + δ), y − t(x, y) dxdδdy = t(x, y) p(x, y) − λ(x, δ, y)dδ dxdy + λ(x, δ, y) f θ (x + δ), y dxdδdy, (14) where we used the density p of the data distribution D. Then, notice that (PV) can be written iteratively as P R = min θ∈Θ p(θ) where p(θ) = min t∈L p max…”

Section: B Proof Of Proposition 31mentioning

confidence: 99%

“…Adversarial robustness. As described in Section 1, it is well-know that state-of-the-art classifiers are susceptible to adversarial attacks [11][12][13][14][15][16][17]26]. Toward addressing this challenging, a rapidlygrowing body of work has provided attack algorithms to generate data perturbations that fool classifiers and defense algorithms which are designed to train robust classifiers to be robust against these perturbations.…”

Section: Further Related Workmentioning

confidence: 99%

“…Learning is at the core of many modern information systems, with wide-ranging applications in clinical research [1][2][3][4], smart grids [5][6][7], and robotics [8][9][10]. However, it has become clear that learning-based solutions suffer from a critical lack of robustness [11][12][13][14][15][16][17], leading to models that are vulnerable to malicious tampering and unsafe behavior [18][19][20][21][22]. While robustness has been studied in statistics for decades [23][24][25], this issue has been exacerbated by the opacity, scale, and non-convexity of modern learning models, such as convolutional neural network (CNNs).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adversarial Robustness with Semi-Infinite Constrained Learning

Robey¹,

Chamon²,

Pappas³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-ofthe-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench.

show abstract

“…These evaluations tend instead to focus on shifts between photos and stylized versions like sketches (Li et al, 2017;Venkateswara et al, 2017;Peng et al, 2019) or synthetic renderings (Peng et al, 2018), or between variants of digits datasets like MNIST (LeCun et al, 1998) and SVHN (Netzer et al, 2011). Unfortunately, prior work has shown that methods that work well on one type of shift need not generalize to others (Taori et al, 2020;Djolonga et al, 2020;Xie et al, 2021a;Miller et al, 2021), which raises the question of how well they would work on a wider array of realistic shifts.…”

Section: Introductionmentioning

confidence: 99%

Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa¹,

Koh²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the Wilds 2.0 update, which extends 8 of the 10 datasets in the Wilds benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original Wilds benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on Wilds is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

show abstract

On Robustness and Transferability of Convolutional Neural Networks

Cited by 12 publications

References 46 publications

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

Adversarial Robustness with Semi-Infinite Constrained Learning

Extending the WILDS Benchmark for Unsupervised Adaptation

Contact Info

Product

Resources

About