Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Lin, Yiguang; Hong, Zhang-Wei; Liao, Yuan-Hong; Shih, Meng-Li; Li, Mingyu; Sun, Min

doi:10.24963/ijcai.2017/525

Cited by 217 publications

(179 citation statements)

References 4 publications

Supporting

Mentioning

178

Contrasting

Order By: Relevance

“…adversarial examples that change almost every pixel in the input state) has previously been generated by a white-box policy access based approach [21], where the adversarial examples are computed via backpropagation. Lin et al [4] proposed the strategically-timed attack and the so-called enchanting attack, but the adversary generation is still based on a white-box policy access assumption and full-state perturbation. Besides, Kos et al [22] compared the influence of full-state perturbations with random noise, and utilized the value function to guide the adversary injection.…”

Section: Related Workmentioning

confidence: 99%

“…These adversaries can easily fool even seemingly high performing deep learning models with human imperceptible perturbations. Such vulnerabilities of deep learning models have been well studied in supervised learning, and also to some extent in RL [3], [4], [20].…”

Section: B Adversarial Attackmentioning

confidence: 99%

“…To solve the optimization problem formulated in Eq. (2), many white-box based approaches (e.g., FSGM [18]) have been applied in previous efforts [3], [4]. These white-box methods are essentially gradient-based methods, as they generate adversarial examples by back-propagating through the policy network to calculate the gradient of a cost function with respect to the input state, i.e., ∇ st J(π, θ, ∆a t , s t ).…”

Section: B Adversarial Attackmentioning

confidence: 99%

“…The exact choice of discrepancy measure D(·) is expected to have a significant impact on the attack performance, as different measures shall capture different patterns of similarities. Previous works [3], [4], [20] apply the Euclidean norm (e.g., L 1 , L 2 , L ∞ ) between π(·|s t + δ t ) and π(·|s t ). However, such L p norm cannot guarantee a successful untargeted attack, since maximizing the L p norm does not ensure that the action selection for the perturbed state has been altered.…”

Section: A Black-box Policy Accessmentioning

confidence: 99%

See 3 more Smart Citations

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

Sun

Ong

et al. 2021

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

Recent studies have revealed that neural network based policies can be easily fooled by adversarial examples. However, while most prior works analyze the effects of perturbing every pixel of every frame assuming white-box policy access, in this paper we take a more restrictive view towards adversary generation -with the goal of unveiling the limits of a model's vulnerability. In particular, we explore minimalistic attacks by defining three key settings: (1) black-box policy access: where the attacker only has access to the input (state) and output (action probability) of an RL policy; (2) fractional-state adversary: where only several pixels are perturbed, with the extreme case being a single-pixel adversary; and (3) tactically-chanced attack: where only significant frames are tactically chosen to be attacked. We formulate the adversarial attack by accommodating the three key settings, and explore their potency on six Atari games by examining four fully trained state-of-the-art policies. In Breakout, for example, we surprisingly find that: (i) all policies showcase significant performance degradation by merely modifying 0.01% of the input state, and (ii) the policy trained by DQN is totally deceived by perturbation to only 1% frames.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Adversarial Attackmentioning

confidence: 99%

Section: B Adversarial Attackmentioning

confidence: 99%

Section: A Black-box Policy Accessmentioning

confidence: 99%

See 2 more Smart Citations

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

Sun

Ong

et al. 2021

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

show abstract

“…One line of systematic study of the first question started in image classification, with seminal early observations from Szegedy et al (Szegedy et al, 2013) that deep artificial neural networks are brittle to adversarial change in inputs that would otherwise be imperceptible to the human eye. This computer vision weakness of the machine has been an angle of attack to design adversaries for reinforcement-learning agents (Lin et al, 2017), followed by general formal insights on adversarial reinforcement learning on the more classical bandit settings (Jun, Li, Ma, & Zhu, 2018). To analyse human choice frailty, our framework involves two steps, the key one also involving a machine-vs-machine adversarial step in which a (deep) reinforcement-learning agent is trained to be an adversary to an RNN; this latter model is trained in a previous step to emulate human decisions following (Dezfouli et al, 2018;Dezfouli, Ashtiani, et al, 2019;Dezfouli, Griffiths, et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Adversarial manipulation of human decision-making

Dezfouli

Nock

Dayan

2020

Preprint

View full text Add to dashboard Cite

Adversarial examples are carefully crafted input patterns that are surprisingly poorly classified by artificial and/or natural neural networks. Here we examine adversarial vulnerabilities in the processes responsible for learning and choice in humans. Building upon recent recurrent neural network models of choice processes, we propose a general framework for generating adversarial opponents that can shape the choices of individuals in particular decision-making tasks towards the behavioural patterns desired by the adversary. We show the efficacy of the framework through two experiments involving action selection and response inhibition. We further investigate the strategy used by the adversary in order to gain insights into the vulnerabilities of human choice. The framework may find applications across behavioural sciences in helping detect and avoid flawed choice.

show abstract

DRL Challenges in Wireless Networks

2023

Deep Reinforcement Learning for Wireless Communications and Networking

View full text Add to dashboard Cite

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

Cited by 217 publications

References 4 publications

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

Minimalistic Attacks: How Little It Takes to Fool Deep Reinforcement Learning Policies

Adversarial manipulation of human decision-making

DRL Challenges in Wireless Networks

Contact Info

Product

Resources

About