Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Behzadan, Vahid; Munir, Arslan

doi:10.48550/arxiv.1701.04143

Cited by 9 publications

(17 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This suggests that transferability is not an inherent property of non-robust ML models and that it may be possible to prevent black-box attacks (at least for some model classes) despite the existence of adversarial examples. 3…”

Section: Limits Of Transferabilitymentioning

confidence: 99%

“…Through slight perturbations of a machine learning (ML) model's inputs at test time, it is possible to generate adversarial examples that cause the model to misclassify at a high rate [4,22]. Adversarial examples can be used to craft human-recognizable images that are misclassified by computer vision models [22,6,14,10,13], software containing malware but classified as benign [21,26,7,8], and game environments that force reinforcement learning agents to misbehave [9,3,12].…”

Section: Introductionmentioning

confidence: 99%

“…The results with the SVM as source model are quantitatively different (the inter-boundary distances in the adversarial directions are much larger). This is likely due to our SVM implementation using a "one-vs-rest" strategy, which yields a different optimization problem for crafting adversarial examples 3. We consider adversaries that attack a target model by transferring adversarial examples crafted on a locally trained model.…”

mentioning

confidence: 99%

See 2 more Smart Citations

The Space of Transferable Adversarial Examples

Tramèr,

Papernot,

Goodfellow

et al. 2017

Preprint

151

190

View full text Add to dashboard Cite

Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time. They often transfer: the same adversarial example fools more than one model. In this work, we propose novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs. We find that adversarial examples span a contiguous subspace of large (~25) dimensionality. Adversarial subspaces with higher dimensionality are more likely to intersect. We find that for two different models, a significant fraction of their subspaces is shared, thus enabling transferability. In the first quantitative analysis of the similarity of different models' decision boundaries, we show that these boundaries are actually close in arbitrary directions, whether adversarial or benign. We conclude by formally studying the limits of transferability. We derive (1) sufficient conditions on the data distribution that imply transferability for simple model classes and (2) examples of scenarios in which transfer does not occur. These findings indicate that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.

show abstract

Section: Limits Of Transferabilitymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

The Space of Transferable Adversarial Examples

Tramèr,

Papernot,

Goodfellow

et al. 2017

Preprint

151

190

View full text Add to dashboard Cite

show abstract

“…While the interest in deep RL solutions is extending into numerous domains such as intelligent transportation systems [1], finance [6] and critical infrastructure [15], ensuring the security and reliability of such solutions in adversarial conditions is only at its preliminary stages. Recently, Behzadan and Munir [3] reported the vulnerability of deep reinforcement learning algorithms to both test-time and training-time attacks using adversarial examples [9]. This work was followed by a number of further investigations (e.g., [11], [12]), verifying the fragility of deep RL agents to such attacks.…”

mentioning

confidence: 98%

“…To this end, we evaluate the performance of Deep Q-Network (DQN) models trained with parameter noise, against the test-time and training-time adversarial example attacks introduced in [3]. Main contributions of this work are: The remainder of this paper is organized as follows: Section 1 reviews the relevant background of DQN, parameter noise training via the NoisyNet approach, and adversarial examples.…”

mentioning

confidence: 99%

Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Behzadan

Munir

2018

Developments in Language Theory

Self Cite

View full text Add to dashboard Cite

Recent developments have established the vulnerability of deep reinforcement learning to policy manipulation attacks via intentionally perturbed inputs, known as adversarial examples. In this work, we propose a technique for mitigation of such attacks based on addition of noise to the parameter space of deep reinforcement learners during training. We experimentally verify the effect of parameter-space noise in reducing the transferability of adversarial examples, and demonstrate the promising performance of this technique in mitigating the impact of whitebox and blackbox attacks at both test and training times.

show abstract

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Han

Rubinstein

Abraham

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Despite the successful application of machine learning (ML) in a wide range of domains, adaptabilitythe very property that makes machine learning desirable-can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, for autonomous cyber defence in software-defined networking (SDN). In particular, we focus on how an RL agent reacts towards different forms of causative attacks that poison its training process, including indiscriminate and targeted, white-box and black-box attacks. In addition, we also study the impact of the attack timing, and explore potential countermeasures such as adversarial training.

show abstract

Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks

Cited by 9 publications

References 0 publications

The Space of Transferable Adversarial Examples

The Space of Transferable Adversarial Examples

Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Contact Info

Product

Resources

About