Adversarial attack and defense in reinforcement learning-from AI security view

Chen, Tong; Liu, Jiqiang; Xiang, Yingxiao; Niu, Wenjia; Tong, Endong; Han, Zhen

doi:10.1186/s42400-019-0027-x

Cited by 94 publications

(53 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other researchers identified an emerging interest for deepfake ransomware [48] in certain cybercriminal circles. Beyond that, it has been demonstrated that via a replica of a victim intelligent system (a deep reinforcement learning agent), the policies of the victim system can be compromised in a targeted way [49].…”

Section: Rda For Ai Risk Instantiations Ia and Ib-examplesmentioning

confidence: 99%

Transdisciplinary AI Observatory—Retrospective Analyses and Future-Oriented Contradistinctions

2021

View full text Add to dashboard Cite

In the last years, artificial intelligence (AI) safety gained international recognition in the light of heterogeneous safety-critical and ethical issues that risk overshadowing the broad beneficial impacts of AI. In this context, the implementation of AI observatory endeavors represents one key research direction. This paper motivates the need for an inherently transdisciplinary AI observatory approach integrating diverse retrospective and counterfactual views. We delineate aims and limitations while providing hands-on-advice utilizing concrete practical examples. Distinguishing between unintentionally and intentionally triggered AI risks with diverse socio-psycho-technological impacts, we exemplify a retrospective descriptive analysis followed by a retrospective counterfactual risk analysis. Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety. As further contribution, we discuss differentiated and tailored long-term directions through the lens of two disparate modern AI safety paradigms. For simplicity, we refer to these two different paradigms with the terms artificial stupidity (AS) and eternal creativity (EC) respectively. While both AS and EC acknowledge the need for a hybrid cognitive-affective approach to AI safety and overlap with regard to many short-term considerations, they differ fundamentally in the nature of multiple envisaged long-term solution patterns. By compiling relevant underlying contradistinctions, we aim to provide future-oriented incentives for constructive dialectics in practical and theoretical AI safety research.

show abstract

Section: Rda For Ai Risk Instantiations Ia and Ib-examplesmentioning

confidence: 99%

Transdisciplinary AI Observatory—Retrospective Analyses and Future-Oriented Contradistinctions

2021

View full text Add to dashboard Cite

show abstract

“…Would there be any guarantees for convergence in such a twisted model? Some approaches do try to create adversarial examples to make the models better suited for outliers (Pinto et al, 2017b;Chen et al, 2019). Our work is novel in that we wish for the model to learn what does not work, leading to better exploration and more accessible domain adaptation to unseen tasks not by changing the tasks or rewards but by letting the same embedding network learn both skills.…”

Section: Introductionmentioning

confidence: 99%

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Ramnath¹,

Tristan²,

Bengio³

2022

Preprint

View full text Add to dashboard Cite

We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. Our approach is grounded on the intuition that nothing makes you learn better than a coevolving adversary. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation. We also adapt existing measures of causal attribution to draw insights from the skills learned. Our experiments demonstrate that the adversarial process leads to a better exploration of multiple solutions and understanding the minimum number of different skills necessary to solve a given set of tasks.

show abstract

“…In more advanced attack models known as insider attacks, attacker falsifies the data input by considering the target DNN structure of the learning model. There are two distinct adversarial attack settings on learning agents: white-box attack where attackers have access to the training model of learning agent and interacts with target model for generating adversarial inputs, and black-box attack where malicious inputs are generated from an estimated training model which is close to the true target model of learning agent [3]. In this paper, we thoroughly investigate security vulnerabilities of DRL based TSCs under two adversarial attack models namely Fast Gradient Sign Method (FGSM) [4] and Jacobian-based Saliency Map Attack (JSMA) [5] with white-box and black-box settings.…”

Section: Introductionmentioning

confidence: 99%

Adversarial Attacks and Defense in Deep Reinforcement Learning (DRL)-Based Traffic Signal Controllers

Haydari¹,

Zhang

Chuah

2021

IEEE Open J. Intell. Transp. Syst.

View full text Add to dashboard Cite

Security attacks on intelligent transportation systems (ITS) may result in life-threatening situations. Combining deep neural networks with reinforcement learning (RL) models called DRL shows promising results when applied to urban Traffic Signal Control (TSC) for adaptive adjustment of traffic light schedules. In this paper, first, we explore the security vulnerabilities of DRL-based TSCs in the presence of adversarial attacks. We investigate the impact of the two distinct threat models with two state-of-the-art adversarial attacks using whitebox and black-box settings. The attacks are simulated on different DRL-based TSC algorithms in a single intersection and multiple intersections. The results show that the performance of the DRL learning agent decreases in both adversarial attack models with white-box and black-box settings resulting in higher levels of traffic congestion. After analysing the adversarial attack models, we explored several sequential anomaly detection models. While sequential anomaly detection models minimizes the detection delays, it also achieves lower false alarm rates due to cumulative anomaly inspection. We also proposed an ensemble model that works with all the attack models without any model assumption. The results of anomaly detectors indicates that low-cost ensemble model achieves the best anomaly detection performance in all attack models and DRL settings.

show abstract

Adversarial attack and defense in reinforcement learning-from AI security view

Cited by 94 publications

References 34 publications

Transdisciplinary AI Observatory—Retrospective Analyses and Future-Oriented Contradistinctions

Transdisciplinary AI Observatory—Retrospective Analyses and Future-Oriented Contradistinctions

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Adversarial Attacks and Defense in Deep Reinforcement Learning (DRL)-Based Traffic Signal Controllers

Contact Info

Product

Resources

About