Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Behzadan, Vahid; Munir, Arslan

doi:10.1007/978-3-319-99229-7_34

Cited by 12 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, Pinto et al [24] introduced two agents to play the zero-sum discount game to ensure the robustness of the policy learning. Different from the adversarial playing, Behzadan et al [4] adopt an equivalent model as the noisy network to generate the adversarial samples via FGSM. Neklyudov et al [21] applied the Gaussian variance layer to generate the adversarial samples, and empirical results show that this method is effective in improving the ability of exploration and robustness of agents.…”

Section: Adversarial Defensementioning

confidence: 99%

“…The DRL is implemented in python based on Pytorch package 1 , and the input of Agent and the attacker from Atari's games are selected image frame which is transferred to 84×84-pixel image, and the policy network is a classical convolutional neural network which is to map the input to the action space. There are three convolutional layers with the size of (32,8,8,4), (64, 4, 4, 2) and (64, 3, 3, 1), where the first value in the bracket is the number of the filters, the second and the third values denote the filter size, and the last value represents the stride size. The last two layers are the fully connected functions that map the hidden representation to the action space, and the shape of the weights in these two layers are (3136, 512) and (512, action space).…”

Section: Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Deep-Attack over the Deep Reinforcement Learning

Li¹,

Wang²

2022

Preprint

View full text Add to dashboard Cite

Recent adversarial attack developments have made reinforcement learning more vulnerable, and different approaches exist to deploy attacks against it, where the key is how to choose the right timing of the attack. Some work tries to design an attack evaluation function to select critical points that will be attacked if the value is greater than a certain threshold. This approach makes it difficult to find the right place to deploy an attack without considering the long-term impact. In addition, there is a lack of appropriate indicators of assessment during attacks. To make the attacks more intelligent as well as to remedy the existing problems, we propose the reinforcement learning-based attacking framework by considering the effectiveness and stealthy spontaneously, while we also propose a new metric to evaluate the performance of the attack model in these two aspects. Experimental results show the effectiveness of our proposed model and the goodness of our proposed evaluation metric. Furthermore, we validate the transferability of the model, and also its robustness under the adversarial training.

show abstract

Section: Adversarial Defensementioning

confidence: 99%

Section: Settingsmentioning

confidence: 99%

Deep-Attack over the Deep Reinforcement Learning

Li¹,

Wang²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Randomization methods [64,2] were first proposed to encourage exploration. NoisyNet [24] adds parametric noise to the network's weight during training, providing better resilience to both training-time and test-time attacks [5,6]. Under the adversarial training framework, Kos et al [38] and Behzadan et al [5] show that re-training with random noise and FGSM perturbations increases the resilience against adversarial examples.…”

Section: Related Workmentioning

confidence: 99%

CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

Wu¹,

Li²,

Huang³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two particular types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards. Specifically, we develop a local smoothing algorithm which uses a policy derived from Q-functions smoothed with Gaussian noise over each encountered state to guarantee the robustness of actions taken along this trajectory. Next, we develop a global smoothing algorithm for certifying the robustness of a finite-horizon cumulative reward under adversarial state perturbations. Finally, we propose a local smoothing approach which makes use of adaptive search in order to obtain tight certification bounds for reward. We use the proposed RL robustness certification framework to evaluate six methods that have previously been shown to yield empirically robust RL, including adversarial training and several forms of regularization, on two representative Atari games. We show that RegPGD, RegCVX, and RadialRL achieve high certified robustness among these. Furthermore, we demonstrate that our certifications are often tight by evaluating these algorithms against adversarial attacks.Preprint. Under review.

show abstract

“…To guarantee the security associated with the learning of RL policies, defenses against training-time attacks have been developed from the standpoint of robustness which refers to the ability of an agent to maintain its functionality in the presence of perturbations [73] (illustrated as Figure 5.1). These robustness-based defenses [61,[73][74][75][76][77][78][79][80]82] either theoretically or empirically guarantee the performance of learning policy under perturbations at training time. In spite of the fact of robustness is a crucial issue, it is merely an add-on concern when designing RL algorithms, which could increase design costs or compromise policy performance.…”

Section: Introductionmentioning

confidence: 99%

Environment poisoning in reinforcement learning: attacks and resilience

Xu¹

View full text Add to dashboard Cite

Upon finishing my Ph.D. thesis, gratitude is the strongest feeling inside of me. I would like to express my sincere appreciation to all those who have provided me with invaluable help in this process. I wish to express my greatest gratitude to my supervisor, Prof. Zinovi Rabinovich, not only for his expert guidance and scholarly advice on my research work, but also for his strong support and encouragement throughout my Ph.D. journey. His insightful comments and critical thinking always inspire me in tackling research problems. His attitude towards the academic research provides me with a good model of what a true researcher should be. Without his patient instruction and constant encouragement, I could not fulfil these research works and complete the Ph.D. study. I would like to thank all talented members in Prof. Rabinovich's team, including Rundong Wang, Ridhima Bector and Wei Qiu, for the kind help and support that made my study in NTU a wonderful time. Besides, I would like to thank my collaborators, Xinghua Qu and Lev Raizman, for their valuable feedback and precious suggestions on my research studies. Many thanks to my friends, Yanling Li and Shuyang Ding, for the joy and the encouragement they gave me. I would like to thank my Thesis Advisory Committee (TAC) members, Prof. Sinno Jialin Pan and Dr. Fedor Duzhin, for their insightful comments and suggestions on my research work. I also want to thank Mr. Kesavan Asaithambi for his support in Computational Intelligence Lab.Most importantly, I would like to thank my parents, Zhaoxin Xu and Jiaojun Han, for their unconditional love and support throughout my life. I would like to express special thanks to my husband and best friend, Hantao Huang, for always being there for me and supporting me at all times. Having you in my life makes me feel incredibly lucky every single day. This thesis is dedicated to all of you.

show abstract

Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise

Cited by 12 publications

References 21 publications

Deep-Attack over the Deep Reinforcement Learning

Deep-Attack over the Deep Reinforcement Learning

CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing

Environment poisoning in reinforcement learning: attacks and resilience

Contact Info

Product

Resources

About