Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Katzir, Ziv; Elovici, Yuval

doi:10.1109/ijcnn.2019.8852285

Cited by 19 publications

(10 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The aim of activation output clustering is to detect anomalous input by analyzing the outputs of a certain hidden layer (usually the last) based on the belief that the normal and anomalous inputs are significantly different in a certain space [86], [87]. The anomalous input can be adversarial input of evasion attack or an input with triggers for poisoning attack, such that the technology is defensive against backdoor poisoning attack and evasion attack and is validated for DDMs [87], [88], [89]. However, this technology does not work for DGMs.…”

Section: Activation Output Clusteringmentioning

confidence: 99%

Adversarial Attacks Against Deep Generative Models on Data: A Survey

Sun

Zhu

Zhang

et al. 2023

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Deep generative models have gained much attention given their ability to generate data for applications as varied as healthcare to financial technology to surveillance, and many more -the most popular models being generative adversarial networks (GANs) and variational auto-encoders (VAEs). Yet, as with all machine learning models, ever is the concern over security breaches and privacy leaks and deep generative models are no exception. In fact, these models have advanced so rapidly in recent years that work on their security is still in its infancy. In an attempt to audit the current and future threats against these models, and to provide a roadmap for defense preparations in the short term, we prepared this comprehensive and specialized survey on the security and privacy preservation of GANs and VAEs. Our focus is on the inner connection between attacks and model architectures and, more specifically, on five components of deep generative models: the training data, the latent code, the generators/decoders of GANs/VAEs, the discriminators/encoders of GANs/VAEs, and the generated data. For each model, component and attack, we review the current research progress and identify the key challenges. The paper concludes with a discussion of possible future attacks and research directions in the field.

show abstract

Section: Activation Output Clusteringmentioning

confidence: 99%

Adversarial Attacks Against Deep Generative Models on Data: A Survey

Sun

Zhu

Zhang

et al. 2023

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

show abstract

“…Metzen et al implemented deep neural networks with a small "detector" sub-network were trained on the binary classification task of distinguishing factual data from data containing adversarial perturbations [56]. The same year, Madry et al [55] [44]. A different notable strategy was taken by researchers Pang et al They used thresholding approach as the detector to filter out adversarial examples for reliable predictions [63].…”

Section: Adversarial Defensementioning

confidence: 99%

Dual-filtering (DF) schemes for learning systems to prevent adversarial attacks

Dasgupta

Gupta

2022

Complex Intell. Syst.

View full text Add to dashboard Cite

Defenses against adversarial attacks are essential to ensure the reliability of machine-learning models as their applications are expanding in different domains. Existing ML defense techniques have several limitations in practical use. We proposed a trustworthy framework that employs an adaptive strategy to inspect both inputs and decisions. In particular, data streams are examined by a series of diverse filters before sending to the learning system and then crossed checked its output through anomaly (outlier) detectors before making the final decision. Experimental results (using benchmark data-sets) demonstrated that our dual-filtering strategy could mitigate adaptive or advanced adversarial manipulations for wide-range of ML attacks with higher accuracy. Moreover, the output decision boundary inspection with a classification technique automatically affirms the reliability and increases the trustworthiness of any ML-based decision support system. Unlike other defense techniques, our dual-filtering strategy does not require adversarial sample generation and updating the decision boundary for detection, makes the ML defense robust to adaptive attacks.

show abstract

“…The third direction is to introduce a preprocessing function to transform the input samples and remove the adversarial perturbations by gradient masking [3,10,18,42,56]. The last category is to detect adversarial examples [4,13,23,34,51,54,58]. Compared with the first three directions, these methods do not need to train a new model with different structures or datasets, or to alter the inference computing pipeline.…”

Section: Defensesmentioning

confidence: 99%

“…Detecting AEs. This methodology [23] explores the sample behaviors in the activation space of different network layers. The hypothesis is that the behaviors of normal samples are different from that of adversarial examples.…”

Section: Activation Spacementioning

confidence: 99%

A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models

Jin¹,

Zhang²,

Shen³

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input can mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, research on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks.In this paper, we present a unified framework for detecting malicious examples and protecting the inference results of Deep Learning models. This framework is based on our observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we repurpose and revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This can enhance our understanding about the inherent features of these two attacks, as well as the defense opportunities.

show abstract

Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

Cited by 19 publications

References 32 publications

Adversarial Attacks Against Deep Generative Models on Data: A Survey

Adversarial Attacks Against Deep Generative Models on Data: A Survey

Dual-filtering (DF) schemes for learning systems to prevent adversarial attacks

A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models

Contact Info

Product

Resources

About