On Mixup Regularization

Carratino, Luigi; Cissé, Moustapha; Jenatton, Rodolphe; Vert, Jean-Philippe

doi:10.48550/arxiv.2006.06049

Cited by 17 publications

(25 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The positive effect of this linear behavior in between samples questioned several authors who aimed at explaining theoretically and empirically Mixup. Carratino et al (2020) shows that Mixup can be interpreted as the combination of a data transformation and a data perturbation. A first transform shrinks both inputs and outputs towards their mean.…”

Section: Related Workmentioning

confidence: 99%

Preventing Manifold Intrusion with Locality: Local Mixup

Raphael¹,

Drumetz²,

Gripon³

2022

Preprint

View full text Add to dashboard Cite

Mixup is a data-dependent regularization technique that consists in linearly interpolating input samples and associated outputs. It has been shown to improve accuracy when used to train on standard machine learning datasets. However, authors have pointed out that Mixup can produce out-of-distribution virtual samples and even contradictions in the augmented training set, potentially resulting in adversarial effects. In this paper, we introduce Local Mixup in which distant input samples are weighted down when computing the loss. In constrained settings we demonstrate that Local Mixup can create a trade-off between bias and variance, with the extreme cases reducing to vanilla training and classical Mixup. Using standardized computer vision benchmarks , we also show that Local Mixup can improve test accuracy.

show abstract

Section: Related Workmentioning

confidence: 99%

Preventing Manifold Intrusion with Locality: Local Mixup

Raphael¹,

Drumetz²,

Gripon³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We finally retrained this best hyperparameter setting on the combined train and validation sets. Dataset Method CIFAR (Krizhevsky, 2009) BatchEnsemble (Wen et al, 2020) Hyper-BatchEnsemble (Wenzel et al, 2020) MIMO (Havasi et al, 2020) Rank-1 BNN (Gaussian) (Dusenberry et al, 2020a) Rank-1 BNN (Cauchy) SNGP MC-Dropout (Gal and Ghahramani, 2016) Ensemble (Lakshminarayanan et al, 2016) Hyper-deep ensemble (Wenzel et al, 2020) Variational Inference (Blundell et al, 2015) Heteroscedastic (Collier et al, 2021) CLINC (Larson et al, 2019) SNGP MC-Dropout Ensemble Diabetic Retinopathy Detection (Filos et al, 2019) MC-Dropout Ensemble Radial Bayesian Neural Networks (Farquhar et al, 2020) Variational Inference ImageNet (Russakovsky et al, 2015) MixUp (Carratino et al, 2020…”

Section: Appendix a Uml Diagrammentioning

confidence: 99%

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

Nado¹,

Band²,

Collier³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-ofthe-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. https://github.com/google/uncertainty-baselines

show abstract

“…An immediate question is, does the added correlation lead to more meaningful representations? It is claimed that the strength of MixUp lies in causing the model to behave linearly between two images [41] or in pushing the examples towards their mean [4]. Both of these claims rely on the combined images to be generated from the same distribution.…”

Section: Occlusion Measurementmentioning

confidence: 99%

On the Effects of Artificial Data Modification

Marcu¹,

Prügel-Bennett²

2021

Preprint

View full text Add to dashboard Cite

Data modification can introduce artificial information. It is often assumed that the resulting artefacts are detrimental to training, whilst being negligible when analysing models. We investigate these assumptions and conclude that in some cases they are unfounded and lead to incorrect results. Specifically, we show current shape bias identification methods and occlusion robustness measures are biased and propose a fairer alternative for the latter. Subsequently, through a series of experiments we seek to correct and strengthen the community's perception of how distorting data affects learning. Based on our empirical results we argue that the impact of the artefacts must be understood and exploited rather than eliminated. MotivationModifying data has become commonplace both when training and analysing models, yet the wider implications are often disregarded. As examples of data modification, on the analysis side we take occlusion robustness and shape bias identification methods. On the training side, we focus on some instances of Mixed Sample Data Augmentation (MSDA), where two images are combined to obtain a new training sample. Visual illustrations of each can be found in Figure 1. In this paper we delve into some of the side-effects of data modification and point out that this practice has resulted in the creation of biased model interpretation tools and poorly informed theories. More specifically, we study a number of assumptions which we show are erroneous and which lie at the heart of the methods we briefly introduce below. Contesting these assumptions has broader implications on the community's perception of what aspects of the data are important when learning.Shape-texture bias: Deep models are known to be sensitive to interventions that are imperceptible to humans [35,14], as well as to other forms of distribution shifts [1,7,9]. It has been argued that this is intimately linked to networks tending to use texture rather than shape information [2,12]. Recently, input distortions have become a popular way of assessing a model's texture bias. To this end, images are divided into a grid and the resulting patches are randomly shuffled such that information is preserved locally, while the global shape is altered [32,26,24,42]. It is implicitly assumed that patch-shuffling does not introduce misleading shape or texture that could affect model evaluation.That is, if a model's accuracy drops when evaluated on patch-shuffled images, this degradation in performance is entirely attributed to the model's bias for shape information. Thus, any side-effects of the data manipulation process are considered negligible.Occlusion robustness: Commonly, occlusion robustness is concerned with the amount of information that can be hidden from a model without affecting its ability to classify [e.g. 36, 28]. A widely adopted proxy for measuring occlusion robustness is through the raw accuracy obtained after superimposing a rectangular patch on an image [6,10,40,43,21]. We refer to this approach as CutOcclusion throughout the paper...

show abstract

On Mixup Regularization

Cited by 17 publications

References 15 publications

Preventing Manifold Intrusion with Locality: Local Mixup

Preventing Manifold Intrusion with Locality: Local Mixup

Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning

On the Effects of Artificial Data Modification

Contact Info

Product

Resources

About