2018
DOI: 10.48550/arxiv.1811.11304
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Adversarial Training

Abstract: Standard adversarial attacks change the predicted class label of an image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(36 citation statements)
references
References 21 publications
0
36
0
Order By: Relevance
“…Existing defenses for adversarial mask UAPs denoise the input or retrain the model to correct its output on perturbed inputs [2,5,30,35]. These approaches are good for correcting model predictions on tested universal attacks without having to intervene during model inference.…”
Section: Comparison With Existing Defensesmentioning
confidence: 99%
See 3 more Smart Citations
“…Existing defenses for adversarial mask UAPs denoise the input or retrain the model to correct its output on perturbed inputs [2,5,30,35]. These approaches are good for correcting model predictions on tested universal attacks without having to intervene during model inference.…”
Section: Comparison With Existing Defensesmentioning
confidence: 99%
“…Adversarial masks are usually generated by directly optimizing over the model's training loss function. These direct attacks are effective, but require white-box access to the model [10,27,35]. Effective UAP attacks can be achieved with Stochastic Gradient Descent (SGD), which uses the Projected Gradient Descent (PGD) [26] algorithm, but optimizes over batches rather than single inputs [30,35,41].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…An untargeted UAP is an adversarial perturbation δ ∈ R n that satisfies F (x + δ) = τ (x) for sufficiently many x ∈ X and with δ p < ε [18]. The most effective UAP attacks for generating δ under ℓ p -norm constraints can be achieved by maximizing the loss i L(x i + δ) with an iterative stochastic gradient descent algorithm [5,23,19,27]. Here, L is the model's training loss, {x i } are batches of inputs, and δ are small perturbations that satisfy δ p < ε.…”
Section: Universal Adversarial Perturbationsmentioning
confidence: 99%