Abstract:Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -training the classification network on images perturbed according to a specific threat model, which defines the magn… Show more
“…Our future work may focus on several promising directions: (i) generalizing this technique for obtaining better gradients from multi-modal networks such as CLIP (Radford et al 2021), which help guide text-to-image diffusion models (Ramesh et al 2022); (ii) implementing robust classifier guidance beyond diffusion models, e.g. for use in classifierguided GAN training (Sauer, Schwarz, and Geiger 2022); (iii) extending our proposed technique to unlabeled datasets; and (iv) seeking better sources of perceptually aligned gradients (Ganz, Kawar, and Elad 2022), so as to better guide the generative diffusion process.…”
Section: Discussionmentioning
confidence: 99%
“…These methods have demonstrated unprecedented realism and mode coverage in synthesized images, achieving stateof-the-art results (Dhariwal and Nichol 2021;Song et al 2021;Vahdat, Kreis, and Kautz 2021) in well-known metrics such as Fréchet Inception Distance -FID (Heusel et al 2017). In addition to image generation, these techniques have also been successful in a multitude of downstream applications such as image restoration (Kawar, Vaksman, and Elad 2021a;Kawar et al 2022), unpaired image-to-image translation (Sasaki, Willcocks, and Breckon 2021), image segmentation (Amit et al 2021), image editing (Liu et al 2021;Avrahami, Lischinski, and Fried 2022), text-to-image generation (Ramesh et al 2022;Saharia et al 2022), and more applications in image processing (Theis et al 2022;Gao et al 2022;Nie et al 2022;Blau et al 2022;Han, Zheng, and Zhou 2022) and beyond (Jeong et al 2021;Chen et al 2022;Ho et al 2022b;Zhou, Du, and Wu 2021).…”
Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a timedependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.
“…Our future work may focus on several promising directions: (i) generalizing this technique for obtaining better gradients from multi-modal networks such as CLIP (Radford et al 2021), which help guide text-to-image diffusion models (Ramesh et al 2022); (ii) implementing robust classifier guidance beyond diffusion models, e.g. for use in classifierguided GAN training (Sauer, Schwarz, and Geiger 2022); (iii) extending our proposed technique to unlabeled datasets; and (iv) seeking better sources of perceptually aligned gradients (Ganz, Kawar, and Elad 2022), so as to better guide the generative diffusion process.…”
Section: Discussionmentioning
confidence: 99%
“…These methods have demonstrated unprecedented realism and mode coverage in synthesized images, achieving stateof-the-art results (Dhariwal and Nichol 2021;Song et al 2021;Vahdat, Kreis, and Kautz 2021) in well-known metrics such as Fréchet Inception Distance -FID (Heusel et al 2017). In addition to image generation, these techniques have also been successful in a multitude of downstream applications such as image restoration (Kawar, Vaksman, and Elad 2021a;Kawar et al 2022), unpaired image-to-image translation (Sasaki, Willcocks, and Breckon 2021), image segmentation (Amit et al 2021), image editing (Liu et al 2021;Avrahami, Lischinski, and Fried 2022), text-to-image generation (Ramesh et al 2022;Saharia et al 2022), and more applications in image processing (Theis et al 2022;Gao et al 2022;Nie et al 2022;Blau et al 2022;Han, Zheng, and Zhou 2022) and beyond (Jeong et al 2021;Chen et al 2022;Ho et al 2022b;Zhou, Du, and Wu 2021).…”
Denoising diffusion probabilistic models (DDPMs) are a recent family of generative models that achieve state-of-the-art results. In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. Therefore, while traditional classifiers may achieve good accuracy scores, their gradients are possibly unreliable and might hinder the improvement of the generation results. Recent work discovered that adversarially robust classifiers exhibit gradients that are aligned with human perception, and these could better guide a generative process towards semantically meaningful images. We utilize this observation by defining and training a timedependent adversarially robust classifier and use it as guidance for a generative diffusion model. In experiments on the highly challenging and diverse ImageNet dataset, our scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as improved generation results under several evaluation metrics. Furthermore, we conduct an opinion survey whose findings indicate that human raters prefer our method's results.
“…Diffusion models [18,51,53,58] are a family of generative models that has recently gained traction, as they advanced the state-of-the-art in image generation [12,26,54,57], and have been deployed in various downstream applications such as image restoration [25,45], adversarial purification [10,34], image compression [55], image classification [61], and others [14,27,37,48,59].…”
Text-conditioned image editing has recently attracted considerable interest. However, most methods are currently either limited to specific editing types (e.g., object overlay, style transfer), or apply to synthetically generated images, or require multiple input images of a common object. In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image. For example, we can change the posture and composition of one or multiple objects inside an image, while preserving its original characteristics. Our method can make a standing dog sit down or jump, cause a bird to spread its wings, etc. -each within its single highresolution natural image provided by the user. Contrary to previous work, our proposed method requires only a single input image and a target text (the desired edit). It oper-˚Equal contribution. The first author performed this work as an intern at Google Research.ates on real images, and does not require any additional inputs (such as image masks or additional views of the object). Our method, which we call "Imagic", leverages a pretrained text-to-image diffusion model for this task. It produces a text embedding that aligns with both the input image and the target text, while fine-tuning the diffusion model to capture the image-specific appearance. We demonstrate the quality and versatility of our method on numerous inputs from various domains, showcasing a plethora of high quality complex semantic image edits, all within a single unified framework.
“…Nevertheless, these iterative algorithms are still considerably slower than GANs, so substantial work has been invested in improving their speed without compromising significantly on generation quality [258,135,247], often achieving impressive speedup levels. Diffusion models have since become ubiquitous in many applications [142,209,21,116,6,253,254,144], prompting researchers to prepare surveys of their impact on the image processing field and beyond [315,60,36]. Figure 8.1: Temporal steps along 3 independent synthesis paths of the Annealed Langevin Dynamics [260] algorithm, using a denoiser [261] trained on LSUN bedroom [319] images.…”
Section: Regularization By Denoising (Red)mentioning
Image denoising -removal of additive white Gaussian noise from an image -is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. Indeed, ten years ago, these achievements have led some researchers to suspect that "Denoising is Dead", in the sense that all that can be achieved in this domain has already been obtained. However, this turned out to be far from the truth, with the penetration of deep learning (DL) into the realm of image processing. The era of DL brought a revolution to image denoising, both by taking the lead in today's ability for noise suppression in images, and by broadening the scope of denoising problems being treated. Our paper starts by describing this evolution, highlighting in particular the tension and synergy that exist between classical approaches and modern Artificial Intelligence (AI) alternatives in design of image denoisers.The recent transitions in the field of image denoising go far beyond the ability to design better denoisers. In the second part of this paper we focus on recently discovered abilities and prospects of image denoisers. We expose the possibility of using image denoisers for service of other problems, such as regularizing general inverse problems and serving as the prime engine in diffusion-based image synthesis. We also unveil the (strange?) idea that denoising and other inverse problems might not have a unique solution, as common algorithms would have us believe. Instead, we describe constructive ways to produce randomized and diverse high perceptual quality results for inverse problems, all fueled by the progress that DL brought to image denoising. This is a survey paper, and its prime goal is to provide a broad view of the history of the field of image denoising and closely related topics in image processing. Our aim is to give a better context to recent discoveries, and to the influence of the AI revolution in our domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.