Diffusion Models for Adversarial Purification

Nie, Weili; Guo, Brandon; Huang, Yujia; Xiao, Chaowei; Vahdat, Arash; Anandkumar, Anima

doi:10.48550/arxiv.2205.07460

Cited by 15 publications

(19 citation statements)

References 19 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work of [15] first studied this problem-however they do not obtain significant accuracy improvements, likely due to the fact that diffusion models available at the time that work was done were not good enough. Separately, [18] suggest that diffusion models might be able to provide strong empirical robustness to adversarial examples, as evaluated by robustness under adversarial attacks computed using existing attack algorithms; this is orthogonal to our results.…”

Section: Related Workmentioning

confidence: 53%

(Certified!!) Adversarial Robustness for Free!

Carlini¹,

Tramèr²,

Krishnamurthy³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models. To do so, we instantiate the denoised smoothing approach of Salman et al. by combining a pretrained denoising diffusion probabilistic model and a standard high-accuracy classifier. This allows us to certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within an 2 norm of ε = 0.5, an improvement of 14 percentage points over the prior certified SoTA using any approach, or an improvement of 30 percentage points over denoised smoothing. We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.

show abstract

Section: Related Workmentioning

confidence: 53%

(Certified!!) Adversarial Robustness for Free!

Carlini¹,

Tramèr²,

Krishnamurthy³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The robustness continues to scale with model capacity, and RobArch-L achieves the new SOTA AA accuracy on the Robust-Bench leaderboard. It is important to note that ResNet-50+DiffPure [39] designed a novel AT method via using diffusion models [22] for adversarial purification. Although the method improves the AA accuracy by 5.97 percentage points, our architecture modifications show stronger robustness even without finetuning the Standard-AT method.…”

Section: Robust Architecture Design Resultsmentioning

confidence: 99%

RobArch: Designing Robust Architectures against Adversarial Attacks

Peng¹,

Xu²,

Cornelius³

et al. 2023

Preprint

View full text Add to dashboard Cite

Adversarial Training is the most effective approach for improving the robustness of Deep Neural Networks (DNNs). However, compared to the large body of research in optimizing the adversarial training process, there are few investigations into how architecture components affect robustness, and they rarely constrain model capacity. Thus, it is unclear where robustness precisely comes from. In this work, we present the first large-scale systematic study on the robustness of DNN architecture components under fixed parameter budgets. Through our investigation, we distill 18 actionable robust network design guidelines that empower model developers to gain deep insights. We demonstrate these guidelines' effectiveness by introducing the novel Robust Architecture (RobArch) model that instantiates the guidelines to build a family of top-performing models across parameter capacities against strong adversarial attacks. RobArch achieves the new state-of-the-art AutoAttack accuracy on the RobustBench ImageNet leaderboard. The code is available at https://github.com/ShengYun-Peng/RobArch.

show abstract

“…Existing research has explored the adversarial example for different generative models yet no proper frameworks have been formulated. Diffusion models are used to improve the adversarial robustness of classifiers (Nie et al, 2022). Kos (Kos et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, the training objective of diffusion models is optimized indirectly through a variational bound and thus is not applicable in the optimization of the adversarial example. For these reasons, existing research only considers diffusion models as assists to improve the robustness of classifiers (Nie et al, 2022), leaving a blank in the formulation of adversarial examples for diffusion models.…”

Section: Introductionmentioning

confidence: 99%

Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples

Liang¹,

Wang²,

Ye³

et al. 2023

Preprint

View full text Add to dashboard Cite

Diffusion Models (DMs) achieve state-of-the-art performance in generative tasks, boosting a wave in AI for Art. Despite the success of commercialization, DMs meanwhile provide tools for copyright violations, where infringers benefit from illegally using paintings created by human artists to train DMs and generate novel paintings in a similar style. In this paper, we show that it is possible to create an image x that is similar to an image x for human vision but unrecognizable for DMs. We build a framework to define and evaluate this adversarial example for diffusion models. Based on the framework, we further propose AdvDM, an algorithm to generate adversarial examples for DMs. By optimizing upon different latent variables sampled from the reverse process of DMs, AdvDM conducts a Monte-Carlo estimation of adversarial examples for DMs. Extensive experiments show that the estimated adversarial examples can effectively hinder DMs from extracting their features. Our method can be a powerful tool for human artists to protect their copyright against infringers with DM-based AI-for-Art applications.

show abstract

Diffusion Models for Adversarial Purification

Cited by 15 publications

References 19 publications

(Certified!!) Adversarial Robustness for Free!

(Certified!!) Adversarial Robustness for Free!

RobArch: Designing Robust Architectures against Adversarial Attacks

Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples

Contact Info

Product

Resources

About