TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning

Kiourti, Panagiota; Wardega, Kacper; Jha, Susmit; Li, Wenchao

doi:10.1109/dac18072.2020.9218663

Cited by 51 publications

(58 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike classification, in DRL, data is generated by an agent interacting with the environment. TrojDRL (Kiourti et al, 2020) proposes data poisoning attacks on DRL agents by altering the observations, after they have been generated by the environment, under a man-in-the-middle (MITM) attack model. More precisely, the observations are altered by a third party, before arriving at the agent to be processed.…”

Section: Trojans In Reinforcement Learning Agentsmentioning

confidence: 99%

“…In this work, we consider an attack model where the DRL environments generate poisoned observations. Our contributions are two-fold: 1) Our proposed method of training agents with triggered behavior is simpler than the algorithms outlined by Kiourti et al (2020), cast as a multitask learning problem, which is an approach that, to the best of our knowledge, has not been explored by other published methods for this purpose, and 2) It enables further research into triggers which may not be as easily supported by the MITM attack model, such as triggers that may emerge from multiple agents interacting in the environment.…”

Section: Trojans In Reinforcement Learning Agentsmentioning

confidence: 99%

“…Some preliminary research into backdoor attacks for text classification networks also exist (Dai et al, 2019;Sun, 2020). Kiourti et al (2020) were among the first to show that DRL agents are also susceptible to poisoning attacks, and outline an algorithm to poison DRL agents under a man-in-the-middle attack model assumption. In this paper, we expand upon Kiourti et al (2020) in the following ways: 1) We demonstrate how to poison deep reinforcement learning (DRL) agents under an alternate attack model, which motivates an important distinction between simple and in-distribution triggers that impacts vulnerability and defense research.…”

Section: Introductionmentioning

confidence: 99%

“…Kiourti et al (2020) were among the first to show that DRL agents are also susceptible to poisoning attacks, and outline an algorithm to poison DRL agents under a man-in-the-middle attack model assumption. In this paper, we expand upon Kiourti et al (2020) in the following ways: 1) We demonstrate how to poison deep reinforcement learning (DRL) agents under an alternate attack model, which motivates an important distinction between simple and in-distribution triggers that impacts vulnerability and defense research. 2) We outline and demonstrate a procedure for embedding backdoors into RL agents through the lens of multitask learning, which is interesting due to its simplicity and its connection to another research area in DRL; and provide concrete implementations in popular DRL environments 1 .…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

Ashcraft,

Karra

2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose a new data poisoning attack and apply it to deep reinforcement learning agents. Our attack centers on what we call in-distribution triggers, which are triggers native to the data distributions the model will be trained on and deployed in. We outline a simple procedure for embedding these, and other, triggers in deep reinforcement learning agents following a multi-task learning paradigm, and demonstrate in three common reinforcement learning environments. We believe that this work has important implications for the security of deep learning models.

show abstract

Section: Trojans In Reinforcement Learning Agentsmentioning

confidence: 99%

Section: Trojans In Reinforcement Learning Agentsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

Ashcraft,

Karra

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…There are backdoor attacks against other tasks or paradigms, such as Refs. [20][21][22] in the area of natural language processing, Refs. [23,24]in reinforcement learning, Refs.…”

Section: Introductionmentioning

confidence: 99%

Backdoor Attacks on Image Classification Models in Deep Neural Networks

Zhang

Wang

et al. 2022

Chinese J of Electronics

View full text Add to dashboard Cite

Deep neural network (DNN) is applied widely in many applications and achieves state-of-the-art performance. However, DNN lacks transparency and interpretability for users in structure. Attackers can use this feature to embed trojan horses in the DNN structure, such as inserting a backdoor into the DNN, so that DNN can learn both the normal main task and additional malicious tasks at the same time. Besides, DNN relies on data set for training. Attackers can tamper with training data to interfere with DNN training process, such as attaching a trigger on input data. Because of defects in DNN structure and data, the backdoor attack can be a serious threat to the security of DNN. The DNN attacked by backdoor performs well on benign inputs while it outputs an attacker-specified label on trigger attached inputs. Backdoor attack can be conducted in almost every stage of the machine learning pipeline. Although there are a few researches in the backdoor attack on image classification, a systematic review is still rare in this field. This paper is a comprehensive review of backdoor attacks. According to whether attackers have access to the training data, we divide various backdoor attacks into two types: poisoningbased attacks and non-poisoning-based attacks. We go through the details of each work in the timeline, discussing its contribution and deficiencies. We propose a detailed mathematical backdoor model to summary all kinds of backdoor attacks. In the end, we provide some insights about future studies.

show abstract

Deep Learning Backdoors

Ma²,

Xue

et al. 2022

Security and Artificial Intelligence

View full text Add to dashboard Cite

TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning

Cited by 51 publications

References 14 publications

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers

Backdoor Attacks on Image Classification Models in Deep Neural Networks

Deep Learning Backdoors

Contact Info

Product

Resources

About