Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

Croce, Francesco; Gowal, Sven; Brunner, Thomas; Shelhamer, Evan; Hein, Matthias; Cemgil, Taylan

doi:10.48550/arxiv.2202.13711

Cited by 6 publications

(12 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As can be seen in Table 1, our method outperforms ADP by up to 32.86%. We should note that the results are lower than presented in [43], this was also observed in [6].…”

Section: Cifar-10 Experimetscontrasting

confidence: 62%

“…This process is very expensive, both in terms of memory and computations, since the attacker needs to keep the entire computational graph in memory and backpropagate from the classifier through all of the diffusion time steps. Ours Gowal [2] Trades [10] AT [6] PAT [5] Figure 6: Robustness accuracy under CIFAR-10-C as a function of the diffusion model maximal depth T * . We compare our method with the results reported in [11,45,25,24].…”

Section: Computational Resourcesmentioning

confidence: 99%

See 1 more Smart Citation

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Blau¹,

Roy²,

Kawar³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models.Preprint. Under review.

show abstract

“…As can be seen in Table 1, our method outperforms ADP by up to 32.86%. We should note that the results are lower than presented in [43], this was also observed in [6].…”

Section: Cifar-10 Experimetscontrasting

confidence: 62%

Section: Computational Resourcesmentioning

confidence: 99%

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Blau¹,

Roy²,

Kawar³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…On various strong adaptive attack benchmarks, we then compare our method with the state-of-the-art adversarial training and adversarial purification methods (Section 5.2 to 5.4). We defer the results against standard attack (i.e., non-adaptive) and black-box attack, suggested by Croce et al (2022), to Appendix C.1 for completeness. Next, we perform various ablation studies to provide better insights into our method (Section 5.5).…”

Section: Methodsmentioning

confidence: 99%

“…In general adaptive attacks are considered to be stronger than standard attack (i.e., non-adaptive). Following the checklist of Croce et al (2022), we report the performance of DiffPure for standard attacks in Table 8. We can see that 1) AutoAttack is effective on the static model as its robust accuracies are zero, and 2) standard attacks are not effective on our method as our robust accuracies against standard attacks are much better than those against adaptive attacks (ref.…”

Section: C1 Robust Accuracies Of Our Methods For Standard Attack and ...mentioning

confidence: 99%

Diffusion Models for Adversarial Purification

Nie¹,

Guo²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin. Project page: https://diffpure.github.io.

show abstract

“…In our work, the attacker's capabilities are defined in a top-down fashion as Table 2. As a randombased defense method, our method is considered relatively vulnerable under EOT attacks [22,54]. It is worth noting that in our attack scenario setting, different attackers have different knowledge degrees of the ensemble model library.…”

Section: Attack Scenariosmentioning

confidence: 99%

Adversarial Robustness in Deep Neural Networks based on Variable Attributes Stochastic Ensemble Model

Qin

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep neural networks (DNN) have been shown to suffer from critical vulnerabilities under adversarial attacks. This phenomenon stimulated the creation of different attack and defense strategies similar to those adopted in cyberspace security. The dependence of such strategies on attack and defense mechanisms makes the associated algorithms on both sides appear as closely reciprocating processes, where the defense method are particularly passive in these processes. Inspired by the dynamic defense approach proposed in cyberspace to solve this endless arm race, this paper defines the model order, network structure and smoothing parameters as ensemble variable attributes and proposes a stochastic strategy to builds upon ensemble model through heterogeneous and redundancy models. The proposed method introduces the diversity and randomness characteristic to deep neural network to change the fixed correspondence gradient between input and output. The unpredictability and diversity of gradients makes it impossible for attackers to directly implement white-box attacks so as to handle the extreme transferability and vulnerability of ensemble models under white-box attacks. Experimental comparison of ASR-vs-distortion curves with different attack scenarios shows that even the attacker with the highest attack capability cannot easily exceed the attack success rate associated with the ensemble smoothed model, especially under untargeted attacks.

show abstract

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

Cited by 6 publications

References 15 publications

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Threat Model-Agnostic Adversarial Defense using Diffusion Models

Diffusion Models for Adversarial Purification

Adversarial Robustness in Deep Neural Networks based on Variable Attributes Stochastic Ensemble Model

Contact Info

Product

Resources

About