Universal Adversarial Training

Shafahi, Ali; Najibi, Mahyar; Xu, Zheng; Dickerson, J. W. T.; Davis, Larry S.; Goldstein, Tom

doi:10.48550/arxiv.1811.11304

Cited by 23 publications

(36 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing defenses for adversarial mask UAPs denoise the input or retrain the model to correct its output on perturbed inputs [2,5,30,35]. These approaches are good for correcting model predictions on tested universal attacks without having to intervene during model inference.…”

Section: Comparison With Existing Defensesmentioning

confidence: 99%

“…Adversarial masks are usually generated by directly optimizing over the model's training loss function. These direct attacks are effective, but require white-box access to the model [10,27,35]. Effective UAP attacks can be achieved with Stochastic Gradient Descent (SGD), which uses the Projected Gradient Descent (PGD) [26] algorithm, but optimizes over batches rather than single inputs [30,35,41].…”

Section: Introductionmentioning

confidence: 99%

“…These direct attacks are effective, but require white-box access to the model [10,27,35]. Effective UAP attacks can be achieved with Stochastic Gradient Descent (SGD), which uses the Projected Gradient Descent (PGD) [26] algorithm, but optimizes over batches rather than single inputs [30,35,41]. The algorithm optimizes i L(x i + δ), where L is the model's training loss, X batch = {x i } are batches of inputs, and δ ∈ P are the valid perturbations.…”

Section: Introductionmentioning

confidence: 99%

“…UAPs can also be generated "indirectly" by either optimizing a proxy objective [23,28,29] or by exploiting a specific property of the input domain such as in procedural noise [9] or Fourier basis functions [42]. These demonstrate that UAPs can be generated under more restrictive threat assumptions, even if they are less effective than direct attacks [10,35]. For proxy objectives, existing work find perturbations that maximally activate hidden layer values [28,29].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Real-time Detection of Practical Universal Adversarial Perturbations

Co¹,

Muñoz-González²,

Kanthan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Universal Adversarial Perturbations (UAPs) are a prominent class of adversarial examples that exploit the systemic vulnerabilities and enable physically realizable and robust attacks against Deep Neural Networks (DNNs). UAPs generalize across many different inputs; this leads to realistic and effective attacks that can be applied at scale. In this paper we propose HyperNeuron, an efficient and scalable algorithm that allows for the real-time detection of UAPs by identifying suspicious neuron hyper-activations. Our results show the effectiveness of HyperNeuron on multiple tasks (image classification, object detection), against a wide variety of universal attacks, and in realistic scenarios, like perceptual ad-blocking and adversarial patches. HyperNeuron is able to simultaneously detect both adversarial mask and patch UAPs with comparable or better performance than existing UAP defenses whilst introducing a significantly reduced latency of only 0.86 milliseconds per image. This suggests that many realistic and practical universal attacks can be reliably mitigated in real-time, which shows promise for the robust deployment of machine learning systems.

show abstract

Section: Comparison With Existing Defensesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Real-time Detection of Practical Universal Adversarial Perturbations

Co¹,

Muñoz-González²,

Kanthan³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…An untargeted UAP is an adversarial perturbation δ ∈ R n that satisfies F (x + δ) = τ (x) for sufficiently many x ∈ X and with δ p < ε [18]. The most effective UAP attacks for generating δ under ℓ p -norm constraints can be achieved by maximizing the loss i L(x i + δ) with an iterative stochastic gradient descent algorithm [5,23,19,27]. Here, L is the model's training loss, {x i } are batches of inputs, and δ are small perturbations that satisfy δ p < ε.…”

Section: Universal Adversarial Perturbationsmentioning

confidence: 99%

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

Co,

Rego,

Lupu

2021

Preprint

View full text Add to dashboard Cite

Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.

show abstract

Defending Against Universal Perturbations With Shared Adversarial Training

Mummadi

Brox

Metzen³

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.

show abstract

Universal Adversarial Training

Cited by 23 publications

References 21 publications

Real-time Detection of Practical Universal Adversarial Perturbations

Real-time Detection of Practical Universal Adversarial Perturbations

Jacobian Regularization for Mitigating Universal Adversarial Perturbations

Defending Against Universal Perturbations With Shared Adversarial Training

Contact Info

Product

Resources

About