Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations

Mandlekar, Ajay; Zhu, Yuke; Garg, Animesh; Li, Feifei; Savarese, Silvio

doi:10.1109/iros.2017.8206245

Cited by 110 publications

(95 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RARL, along with similar methods [12], is able to achieve some robustness, but the level of variation seen during training may not be diverse enough to resemble the variety encountered in the real-world. Specifically, the adversary does not actively seek catastrophic outcomes as does the agent constructed in this paper.…”

Section: Introductionmentioning

confidence: 99%

Risk Averse Robust Adversarial Reinforcement Learning

Pan

Seita

Gao

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.

show abstract

Section: Introductionmentioning

confidence: 99%

Risk Averse Robust Adversarial Reinforcement Learning

Pan

Seita

Gao

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…However, the nature of the algorithm makes it computational intensive and increases the delay in detection because the target agent has to be fooled before the master agent can begin its defense procedure. Furthermore, the adversarially robust policy learning (ARPL) was proposed in [180]. This algorithm, targeted at the defense of autonomous agents in physical domains like self-driving cars and robots uses adversarial agents during the training of RL agents to make them resilient to adversarial attacks in the form of changes in the environment.…”

Section: B Defense Against Adversarial Attacks In Rlmentioning

confidence: 99%

Resilient Machine Learning for Networked Cyber Physical Systems: A Survey for Machine Learning Security to Securing Machine Learning for CPS

Olowononi

Rawat

Liu

2021

IEEE Commun. Surv. Tutorials

132

View full text Add to dashboard Cite

Cyber Physical Systems (CPS) are characterized by their ability to integrate the physical and information or cyber worlds. Their deployment in critical infrastructure have demonstrated a potential to transform the world. However, harnessing this potential is limited by their critical nature and the far reaching effects of cyber attacks on human, infrastructure and the environment. An attraction for cyber concerns in CPS rises from the process of sending information from sensors to actuators over the wireless communication medium, thereby widening the attack surface. Traditionally, CPS security has been investigated from the perspective of preventing intruders from gaining access to the system using cryptography and other access control techniques. Most research work have therefore focused on the detection of attacks in CPS. However, in a world of increasing adversaries, it is becoming more difficult to totally prevent CPS from adversarial attacks, hence the need to focus on making CPS resilient. Resilient CPS are designed to withstand disruptions and remain functional despite the operation of adversaries. One of the dominant methodologies explored for building resilient CPS is dependent on machine learning (ML) algorithms. However, rising from recent research in adversarial ML, we posit that ML algorithms for securing CPS must themselves be resilient. This paper is therefore aimed at comprehensively surveying the interactions between resilient CPS using ML and resilient ML when applied in CPS. The paper concludes with a number of research trends and promising future research directions. Furthermore, with this paper, readers can have a thorough understanding of recent advances on ML-based security and securing ML for CPS and countermeasures, as well as research trends in this active research area.

show abstract

“…All adversaries could be subsumed via a single adversary with large admissible set. However, the resulting dynamics would not capture underlying structure of the simulation gap [8] and the optimal policy would be too conservative [26]. Therefore, we disambiguate between the different adversaries to capture this structure.…”

Section: P Smentioning

confidence: 99%

Robust Value Iteration for Continuous Control Tasks

Lutter

Mannor

Peters

et al. 2021

Robotics: Science and Systems XVII

Self Cite

View full text Add to dashboard Cite

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding statedistribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuoustime perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

show abstract

Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations

Cited by 110 publications

References 15 publications

Risk Averse Robust Adversarial Reinforcement Learning

Risk Averse Robust Adversarial Reinforcement Learning

Resilient Machine Learning for Networked Cyber Physical Systems: A Survey for Machine Learning Security to Securing Machine Learning for CPS

Robust Value Iteration for Continuous Control Tasks

Contact Info

Product

Resources

About