2018
DOI: 10.48550/arxiv.1807.06064
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Robust Policy Learning in the Presence of Unknown Adversaries

Abstract: The growing prospect of deep reinforcement learning (DRL) being used in cyber-physical systems has raised concerns around safety and robustness of autonomous agents. Recent work on generating adversarial attacks have shown that it is computationally feasible for a bad actor to fool a DRL policy into behaving sub optimally. Although certain adversarial attacks with specific attack models have been addressed, most studies are only interested in off-line optimization in the data space (e.g., example fitting, dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…The results indicate that in both environments, DQN agents are able to recover from noncontiguous attacks with attack probabilities p = 0.2 and p = 0.4 and converge to optimal performance, while they fail to recover under attacks with p = 0.8 and p = 1.0 (contiguous attack). It is observed that for the agents that Hierarchical RL Havens et al [71] Game Theoretic Ogunmolu et al [72], Bravo & Mertikopoulos [73] recover, the training performance deteriorates almost uniformly until a minimum point is reached, from which onward the agent begins to recover and adjust the policy towards optimal performance. The authors' interpretation of this behavior is based on the statistics of experience replay: for the agent to recover from adversarial perturbations, the number of interactions with the perturbed environment must reach a critical threshold, so that the randomly sampled batches from the experience memory can represent the statistical significance of perturbations.…”
Section: State Of Defensesmentioning
confidence: 99%
See 1 more Smart Citation
“…The results indicate that in both environments, DQN agents are able to recover from noncontiguous attacks with attack probabilities p = 0.2 and p = 0.4 and converge to optimal performance, while they fail to recover under attacks with p = 0.8 and p = 1.0 (contiguous attack). It is observed that for the agents that Hierarchical RL Havens et al [71] Game Theoretic Ogunmolu et al [72], Bravo & Mertikopoulos [73] recover, the training performance deteriorates almost uniformly until a minimum point is reached, from which onward the agent begins to recover and adjust the policy towards optimal performance. The authors' interpretation of this behavior is based on the statistics of experience replay: for the agent to recover from adversarial perturbations, the number of interactions with the perturbed environment must reach a critical threshold, so that the randomly sampled batches from the experience memory can represent the statistical significance of perturbations.…”
Section: State Of Defensesmentioning
confidence: 99%
“…During policy learning, information perturbation can be generally viewed as a bias that can prevent the agent from effectively learning the desired policy. Inspired by this idea, Havens et al [71] propose a hierarchical metalearning framework, named Meta-Learned Advantage Hierarchy (MLAH). Their work considers a policy learning problem where there are periods of adversarial attacks that corrupt state observations during the continuous learning of the agent, and aims at the online mitigation of the bias introduced by the attack into the nominal policy.…”
Section: State Of Defensesmentioning
confidence: 99%