Towards Evaluating the Robustness of Neural Networks

Carlini, Nicholas; Wagner, David

doi:10.48550/arxiv.1608.04644

Cited by 87 publications

(169 citation statements)

References 0 publications

Supporting

Mentioning

159

Contrasting

Order By: Relevance

“…One major focus of these optimization problems is on testing the resilience of neural networks against adversarial attack [7]. This involves either maximizing a notion of resilience [8] or finding minimal perturbations needed to misclassify an image [16].…”

Section: Outline and Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Design and Control Problems Involving Neural Network Surrogates

Yang¹,

Balaprakash²,

Leyffer³

2021

Preprint

View full text Add to dashboard Cite

We consider nonlinear optimization problems that involve surrogate models represented by neural networks. We demonstrate first how to directly embed neural network evaluation into optimization models, highlight a difficulty with this approach that can prevent convergence, and then characterize stationarity of such models. We then present two alternative formulations of these problems in the specific case of feedforward neural networks with ReLU activation: as a mixed-integer optimization problem and as a mathematical program with complementarity constraints. For the latter formulation we prove that stationarity at a point for this problem corresponds to stationarity of the embedded formulation. Each of these formulations may be solved with state-of-the-art optimization methods, and we show how to obtain good initial feasible solutions for these methods. We compare our formulations on three practical applications arising in the design and control of combustion engines, in the generation of adversarial attacks on classifier networks, and in the determination of optimal flows in an oil well network.

show abstract

Section: Outline and Contributionsmentioning

confidence: 99%

“…The full optimization problem is reproduced in Eqns. (7), and the associated sets are given in Table 2.…”

Section: E Rmentioning

confidence: 99%

Modeling Design and Control Problems Involving Neural Network Surrogates

Yang¹,

Balaprakash²,

Leyffer³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…• l 0 attack: OnePixel [Su et al, 2019], SparseFool [Modas et al, 2019] • l 2 attack: Projected Gradient Descent-l 2 (PGDL2) [Goodfellow et al, 2014, Madry et al, 2017, DeepFool [Moosavi-Dezfooli et al, 2015], CW attack [Carlini and Wagner, 2016], AutoAttack-l 2 [Wong et al, 2020] • l ∞ attack: Fast Gradient Sign Method (FGSM) [Goodfellow et al, 2014], Projected Gradient Descent (PGD) [Goodfellow et al, 2014, Madry et al, 2017, AutoAttack-l ∞ [Wong et al, 2020] FGSM As one of the earliest and most popular adversarial attacks described by Goodfellow et al [2014], Fast Gradient Sign Method (FGSM) serves as a baseline attack in our training. As notified previously, to optimize the parameter in trained models is to maximize the loss function over δ.…”

Section: Adversarial Attackmentioning

confidence: 99%

“…This is mainly due to the time-consuming generation of adversarial examples which alone requires an optimization procedure, e.g. via fast gradient sign method (FGSM) [Goodfellow et al, 2014], projected gradient descent (PGD) [Goodfellow et al, 2014, Madry et al, 2017, Kurakin et al, 2016, One Pixel attack [Su et al, 2019], CW attack [Carlini and Wagner, 2016], or DeepFool [Moosavi-Dezfooli et al, 2015].…”

Section: Introductionmentioning

confidence: 99%

Improving Adversarial Robustness for Free with Snapshot Ensemble

Wang

2021

Preprint

View full text Add to dashboard Cite

Adversarial training, as one of the few certified defenses against adversarial attacks, can be quite complicated and time-consuming, while the results might not be robust enough. To address the issue of lack of robustness, ensemble methods were proposed, aiming to get the final output by weighting the selected results from repeatedly trained processes. It is proved to be very useful in achieving robust and accurate results, but the computational and memory costs are even higher. Snapshot ensemble, a new ensemble method that combines several local minima in a single training process to make the final prediction, was proposed recently, which reduces the time spent on training multiple networks and the memory to store the results. Based on the snapshot ensemble, we present a new method that is easier to implement: unlike original snapshot ensemble that seeks for local minima, our snapshot ensemble focuses on the last few iterations of a training and stores the sets of parameters from them. Our algorithm is much simpler but the results are no less accurate than the original ones: based on different hyperparameters and datasets, our snapshot ensemble has shown a 5% to 30% increase in accuracy when compared to the traditional adversarial training.

show abstract

“…Since 2014, when the observation was made that applying small perturbations to inputs can cause dramatic shifts in model outputs [16], the field of adversarial machine learning has been elevated to the forefront of computer vision research, with numerous techniques for generating so-called adversarial attacks being published each year. One such technique, FGSM [8], is based upon the model gradients, and serves as the foundation for other techniques such as PGD [13]; this class of adversarial example generation technique generally applies per-pixel perturbations budgeted according to some other techniques, such as DeepFool [14]and C&W [2], similarly apply pixel-level changes, often while simultaneously reducing the overall amount of applied perturbation and still maintaining an impressive rate of misclassifica-tion. Though many of these pixel-level perturbation techniques are generalizable from the task of image classification to broader applications, there has been some research into adapting these techniques specifically to suit the domain of facial recognition [1].…”

Section: Adversarial Attack Strategiesmentioning

confidence: 99%

Adversarial Training for Face Recognition Systems using Contrastive Adversarial Learning and Triplet Loss Fine-tuning

Karim¹,

Khalid²,

Nick³

et al. 2021

Preprint

View full text Add to dashboard Cite

Though much work has been done in the domain of improving the adversarial robustness of facial recognition systems, a surprisingly small percentage of it has focused on self-supervised approaches. In this work, we present an approach that combines Adversarial Pre-Training with Triplet Loss Adversarial Fine-Tuning. We compare our methods with the pretrained ResNet50 model that forms the backbone of FaceNet, finetuned on our CelebA dataset. Through comparing adversarial robustness achieved with no adversarial training, triplet loss adversarial training, and our contrastive pre-training combined with triplet loss adversarial fine-tuning, we find that our method achieves comparable results with far fewer epochs required during fine-tuning. This seems promising, as increasing the training time for fine-tuning should yield even better results. In addition to this, a modified semi-supervised experiment was conducted, which demonstrated the improvement of contrastive adversarial training with the introduction of small amounts of labels.

show abstract

Towards Evaluating the Robustness of Neural Networks

Cited by 87 publications

References 0 publications

Modeling Design and Control Problems Involving Neural Network Surrogates

Modeling Design and Control Problems Involving Neural Network Surrogates

Improving Adversarial Robustness for Free with Snapshot Ensemble

Adversarial Training for Face Recognition Systems using Contrastive Adversarial Learning and Triplet Loss Fine-tuning

Contact Info

Product

Resources

About