2019 International Joint Conference on Neural Networks (IJCNN) 2019
DOI: 10.1109/ijcnn.2019.8852285
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Abstract: Neural network based classifiers are still prone to manipulation through adversarial perturbations. State of the art attacks can overcome most of the defense or detection mechanisms suggested so far, and adversaries have the upper hand in this arms race.Adversarial examples are designed to resemble the normal input from which they were constructed, while triggering an incorrect classification. This basic design goal leads to a characteristic spatial behavior within the context of Activation Spaces, a term coin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 32 publications
0
10
0
Order By: Relevance
“…The aim of activation output clustering is to detect anomalous input by analyzing the outputs of a certain hidden layer (usually the last) based on the belief that the normal and anomalous inputs are significantly different in a certain space [86], [87]. The anomalous input can be adversarial input of evasion attack or an input with triggers for poisoning attack, such that the technology is defensive against backdoor poisoning attack and evasion attack and is validated for DDMs [87], [88], [89]. However, this technology does not work for DGMs.…”
Section: Activation Output Clusteringmentioning
confidence: 99%
“…The aim of activation output clustering is to detect anomalous input by analyzing the outputs of a certain hidden layer (usually the last) based on the belief that the normal and anomalous inputs are significantly different in a certain space [86], [87]. The anomalous input can be adversarial input of evasion attack or an input with triggers for poisoning attack, such that the technology is defensive against backdoor poisoning attack and evasion attack and is validated for DDMs [87], [88], [89]. However, this technology does not work for DGMs.…”
Section: Activation Output Clusteringmentioning
confidence: 99%
“…Metzen et al implemented deep neural networks with a small "detector" sub-network were trained on the binary classification task of distinguishing factual data from data containing adversarial perturbations [56]. The same year, Madry et al [55] [44]. A different notable strategy was taken by researchers Pang et al They used thresholding approach as the detector to filter out adversarial examples for reliable predictions [63].…”
Section: Adversarial Defensementioning
confidence: 99%
“…The third direction is to introduce a preprocessing function to transform the input samples and remove the adversarial perturbations by gradient masking [3,10,18,42,56]. The last category is to detect adversarial examples [4,13,23,34,51,54,58]. Compared with the first three directions, these methods do not need to train a new model with different structures or datasets, or to alter the inference computing pipeline.…”
Section: Defensesmentioning
confidence: 99%
“…Detecting AEs. This methodology [23] explores the sample behaviors in the activation space of different network layers. The hypothesis is that the behaviors of normal samples are different from that of adversarial examples.…”
Section: Activation Spacementioning
confidence: 99%