Distributionally Robust Stochastic Optimization with Wasserstein Distance

Gao, Rui; Kleywegt, Anton J.

doi:10.48550/arxiv.1604.02199

Cited by 146 publications

(293 citation statements)

References 47 publications

Supporting

Mentioning

289

Contrasting

Order By: Relevance

“…There have been a number of theorectical work on DRO and Optimal Transport, see [9,8,22,48,40,55,58]. In particular, [26,50,28,27,6] study the theory and applications of DRO problems using Wasserstein distance to parameterize the constraint set. [59] generalizes models to unseen domains by training the models with DRO.…”

Section: Adversarial Attack and Human Visionmentioning

confidence: 99%

Human Imperceptible Attacks and Applications to Improve Fairness

Hua¹,

Xu²,

Blanchet³

et al. 2021

Preprint

View full text Add to dashboard Cite

Modern neural networks are able to perform at least as well as humans in numerous tasks involving object classification and image generation. However, small perturbations which are imperceptible to humans may significantly degrade the performance of well-trained deep neural networks. We provide a Distributionally Robust Optimization (DRO) framework which integrates human-based image quality assessment methods to design optimal attacks that are imperceptible to humans but significantly damaging to deep neural networks. Through extensive experiments, we show that our attack algorithm generates better-quality (less perceptible to humans) attacks than other state-of-the-art human imperceptible attack methods. Moreover, we demonstrate that DRO training using our optimally designed human imperceptible attacks can improve group fairness in image classification. Towards the end, we provide an algorithmic implementation to speed up DRO training significantly, which could be of independent interest.

show abstract

Section: Adversarial Attack and Human Visionmentioning

confidence: 99%

Human Imperceptible Attacks and Applications to Improve Fairness

Hua¹,

Xu²,

Blanchet³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…There has been substantial work on formulating appropriate uncertainty sets (Hu & Hong, 2013;Gao & Kleywegt, 2016;Levy et al, 2020). Most relevant to our multitask setting are group-structured uncertainty sets, where the maximimum is taken over a mixture of sub-populations (Oren et al, 2019;Sagawa et al, 2020;Zhou et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Balancing Average and Worst-case Accuracy in Multitask Learning

Michel¹,

Ruder²,

Yogatama³

2021

Preprint

View full text Add to dashboard Cite

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy-which may be biased by easy or redundant tasks-but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning. We highlight several failure cases of DRO when applied off-the-shelf and present an improved method, Lookahead-DRO (L-DRO), which mitigates these issues. The core idea of L-DRO is to anticipate the interaction between tasks during training in order to choose a dynamic re-weighting of the various task losses, which will (i) lead to minimal worst-case loss and (ii) train on as many tasks as possible. After demonstrating the efficacy of L-DRO on a small controlled synthetic setting, we evaluate it on two realistic benchmarks: a multitask version of the CIFAR-100 image classification dataset and a large-scale multilingual language modeling experiment. Our empirical results show that L-DRO achieves a better trade-off between average and worst-case accuracy with little computational overhead compared to several strong baselines.

show abstract

“…Following from the measure concentration result, we know that if the radius is set according to (14), the Wasserstein set Ω will include the true measure P * with probability at least 1 − α, and thus the expected log-loss can be bounded by the optimal value of the DRO formulation (1). Theorem 3.1 ([11], Theorem 3.5).…”

Section: Out-of-sample Performancementioning

confidence: 99%

“…Theorem 3.1 ([11], Theorem 3.5). Suppose ĴN and BN are respectively the optimal value and optimal solution to the DRO problem (1) with the ambiguity set radius specified in (14), where α ∈ (0, 1). Then we have, with probability at least 1 − α with respect to the sampling,…”

Section: Out-of-sample Performancementioning

confidence: 99%

“…We consider the multi-class classification problem under the framework of Distributionally Robust Optimization (DRO), where the ambiguity set is defined via the Wasserstein metric [14,11]. We focus on developing robust classification algorithms that are immunized against the presence of outliers in the data, motivated by the fact that standard approaches, such as Logistic Regression (LR), are vulnerable to contamination of the dataset by outliers.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Distributionally Robust Multiclass Classification and Applications in Deep CNN Image Classifiers

Chen¹,

Hao²,

Paschalidis³

2021

Preprint

View full text Add to dashboard Cite

We develop a Distributionally Robust Optimization (DRO) formulation for Multiclass Logistic Regression (MLR), which could tolerate data contaminated by outliers. The DRO framework uses a probabilistic ambiguity set defined as a ball of distributions that are close to the empirical distribution of the training set in the sense of the Wasserstein metric. We relax the DRO formulation into a regularized learning problem whose regularizer is a norm of the coefficient matrix. We establish out-of-sample performance guarantees for the solutions to our model, offering insights on the role of the regularizer in controlling the prediction error. We apply the proposed method in rendering deep CNN-based image classifiers robust to random and adversarial attacks. Specifically, using the MNIST and CIFAR-10 datasets, we demonstrate reductions in test error rate by up to 78.8% and loss by up to 90.8%. We also show that with a limited number of perturbed images in the training set, our method can improve the error rate by up to 49.49% and the loss by up to 68.93% compared to Empirical Risk Minimization (ERM), converging faster to an ideal loss/error rate as the number of perturbed images increases.

show abstract

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Cited by 146 publications

References 47 publications

Human Imperceptible Attacks and Applications to Improve Fairness

Human Imperceptible Attacks and Applications to Improve Fairness

Balancing Average and Worst-case Accuracy in Multitask Learning

Distributionally Robust Multiclass Classification and Applications in Deep CNN Image Classifiers

Contact Info

Product

Resources

About