Maksym Andriushchenko scite author profile

Classifiers used in the wild, in particular for safetycritical systems, should not only have good generalization properties but also should know when they don't know, in particular make low confidence predictions far away from the training data. We show that ReLU type neural networks which yield a piecewise linear classifier function fail in this regard as they produce almost always high confidence predictions far away from the training data. For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data. We show that this technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions and test error on the original classification task compared to standard training.

show abstract

Square Attack: a query-efficient black-box adversarial attack via random search

Andriushchenko¹,

Croce²,

Flammarion³

et al. 2019

Preprint

View full text Add to dashboard Cite

We propose the Square Attack, a new score-based blackbox l 2 and l ∞ adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. The Square Attack is based on a randomized search scheme where we select localized square-shaped updates at random positions so that the l ∞ -or l 2 -norm of the perturbation is approximately equal to the maximal budget at each step. Our method is algorithmically transparent, robust to the choice of hyperparameters, and is significantly more query efficient compared to the more complex state-of-the-art methods. In particular, on ImageNet we improve the average query efficiency for various deep networks by a factor of at least 2 and up to 7 compared to the recent state-of-the-art l ∞ -attack of Meunier et al.[34] while having a higher success rate. The Square Attack can even be competitive to gradient-based white-box attacks in terms of success rate. Moreover, we show its utility by breaking a recently proposed defense based on randomization. The code of our attack is available at https: //github.com/max-andr/square-attack. * Equal contribution. Street sign → parking meterSquare Attack (ours) Meunier et al Bandits

show abstract

RobustBench: a standardized adversarial robustness benchmark

Croce¹,

Andriushchenko²,

Sehwag³

et al. 2020

Preprint

View full text Add to dashboard Cite

Evaluation of adversarial robustness is often error-prone leading to overestimation of the true robustness of models. While adaptive attacks designed for a particular defense are a way out of this, there are only approximate guidelines on how to perform them. Moreover, adaptive evaluations are highly customized for particular models, which makes it difficult to compare different defenses. Our goal is to establish a standardized benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. This requires to impose some restrictions on the admitted models to rule out defenses that only make gradient-based attacks ineffective without improving actual robustness. We evaluate robustness of models for our benchmark with AutoAttack, an ensemble of white-and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications. Our leaderboard, hosted at http://robustbench.github.io/, aims at reflecting the current state of the art on a set of well-defined tasks in ∞ -and 2 -threat models with possible extensions in the future. Additionally, we open-source the library http://github.com/RobustBench/robustbench that provides unified access to state-of-the-art robust models to facilitate their downstream applications. Finally, based on the collected models, we analyze general trends in p -robustness and its impact on other tasks such as robustness to various distribution shifts and out-of-distribution detection.

show abstract

On the effectiveness of adversarial training against common corruptions

Kireev¹,

Andriushchenko²,

Flammarion³

2021

Preprint

View full text Add to dashboard Cite

The literature on robustness towards common corruptions shows no consensus on whether adversarial training can improve the performance in this setting. First, we show that, when used with an appropriately selected perturbation radius, p adversarial training can serve as a strong baseline against common corruptions. Then we explain why adversarial training performs better than data augmentation with simple Gaussian noise which has been observed to be a meaningful baseline on common corruptions. Related to this, we identify the σ-overfitting phenomenon when Gaussian augmentation overfits to a particular standard deviation used for training which has a significant detrimental effect on common corruption accuracy. We discuss how to alleviate this problem and then how to further enhance p adversarial training by introducing an efficient relaxation of adversarial training with learned perceptual image patch similarity as the distance metric. Through experiments on CIFAR-10 and ImageNet-100, we show that our approach does not only improve the p adversarial training baseline but also has cumulative gains with data augmentation methods such as AugMix, ANT, and SIN leading to state-of-the-art performance on common corruptions. The code of our experiments is publicly available at https://github.com/tml-epfl/adv-training-corruptions.

show abstract

Understanding and Improving Fast Adversarial Training

Andriushchenko¹,

Flammarion²

2020

Preprint

View full text Add to dashboard Cite

A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. [46] showed that ∞ -adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called catastrophic overfitting, when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in [46], does not prevent catastrophic overfitting, and that randomness is not important per se -its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger ∞ -perturbations and reduce the gap to multi-step adversarial training. The code of our experiments is available at https://github.com/tml-epfl/understanding-fast-adv-training.

show abstract

Sparse-RS: A Versatile Framework for Query-Efficient Sparse Black-Box Adversarial Attacks

Croce

Andriushchenko

Singh

et al. 2022

AAAI

View full text Add to dashboard Cite

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting. Sparse-RS does not rely on substitute models and achieves state-of-the-art success rate and query efficiency for multiple sparse attack models: L0-bounded perturbations, adversarial patches, and adversarial frames. The L0-version of untargeted Sparse-RS outperforms all black-box and even all white-box attacks for different models on MNIST, CIFAR-10, and ImageNet. Moreover, our untargeted Sparse-RS achieves very high success rates even for the challenging settings of 20x20 adversarial patches and 2-pixel wide adversarial frames for 224x224 images. Finally, we show that Sparse-RS can be applied to generate targeted universal adversarial patches where it significantly outperforms the existing approaches. Our code is available at https://github.com/fra31/sparse-rs.

show abstract

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem

Hein¹,

Andriushchenko²,

Bitterwolf³

2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.