Safe Reinforcement Learning via Statistical Model Predictive Shielding

Bastani, Osbert; Li, Shuo; Xue, Anton

doi:10.15607/rss.2021.xvii.026

Cited by 28 publications

(25 citation statements)

References 31 publications

(73 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that when w = 0, it is equivalent to the removal of the reachability criteria in our algorithm, and similar to a shielding-based MPC approach in [43]. In addition, w = 0 does not mean our method is similar to recovery RL since it can trigger π safe before reaching the safety boundary.…”

Section: Catwalkmentioning

confidence: 99%

Safe Reinforcement Learning for Legged Locomotion

Yang¹,

Zhang²,

Luu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Designing control policies for legged locomotion is complex due to the under-actuated and non-continuous robot dynamics. Model-free reinforcement learning provides promising tools to tackle this challenge. However, a major bottleneck of applying model-free reinforcement learning in real world is safety. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy that prevents the robot from entering unsafe states, and a learner policy that is optimized to complete the task. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally intervening in the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in four locomotion tasks on a simulated and real quadrupedal robot: efficient gait, catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods in simulation. When deployed it on realworld quadruped robot, our training pipeline enables 34% improvement in energy efficiency for the efficient gait, 40.9% narrower of the feet placement in the catwalk, and two times more jumping duration in the two-leg balance. Our method achieves less than five falls over the duration of 115 minutes of hardware time. 1

show abstract

Section: Catwalkmentioning

confidence: 99%

Safe Reinforcement Learning for Legged Locomotion

Yang¹,

Zhang²,

Luu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Then, the shield policy uses π if x ∈ X rec , and π 0 otherwise (Bastani, 2019). This policy guarantees safety as long as an initial state is recoverable-i.e., x 0 ∈ X rec .…”

Section: Application To Safe Planningmentioning

confidence: 99%

“…There has also been work on safe learning-based control (Akametalu et al, 2014;Fisac et al, 2019;Bastani, 2019;Wabersich & Zeilinger, 2018;Alshiekh et al, 2018); however, these approaches are not applicable to perception-based control. The most closely related work is Dean et al (2019), which handles perception, but they are restricted to known linear dynamics.…”

Section: Introductionmentioning

confidence: 99%

PAC Confidence Predictions for Deep Neural Network Classifiers

Park¹,

Li²,

Lee³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

A key challenge for deploying deep neural networks (DNNs) in safety critical settings is the need to provide rigorous ways to quantify their uncertainty. In this paper, we propose a novel algorithm for constructing predicted classification confidences for DNNs that comes with provable correctness guarantees. Our approach uses Clopper-Pearson confidence intervals for the Binomial distribution in conjunction with the histogram binning approach to calibrated prediction. In addition, we demonstrate how our predicted confidences can be used to enable downstream guarantees in two settings: (i) fast DNN inference, where we demonstrate how to compose a fast but inaccurate DNN with an accurate but slow DNN in a rigorous way to improve performance without sacrificing accuracy, and (ii) safe planning, where we guarantee safety when using a DNN to predict whether a given action is safe based on visual observations. In our experiments, we demonstrate that our approach can be used to provide guarantees for state-of-the-art DNNs.

show abstract

“…Many methods in this area leverage optimization tools to prove that a learned neural network policy satisfies a given safety constraint [23], [24], [25], [26], [18], [19], [22]. A related approach is shielding, which verifies a backup controller, and then overrides the LEC using the backup controller when it can no longer ensure 1 Shuo Li is with the GRASP Lab, University of Pennsylvania, USA lishuo1@seas.upenn.edu 2 Osbert Bastani is with the Department of Computer and Information Science, University of Pennsylvania, USA obastani@seas.upenn.edu that using the LEC is safe [16], [17], [20], [27]. While these methods provide strong mathematical guarantees, they suffer from a number of shortcomings.…”

Section: Introductionmentioning

confidence: 99%

“…We build on a recently proposed idea called model predictive shielding (MPS), which has been used to ensure safety of learned control policies [28], [27], including extensions to the multi-agent setting [29]. The basic idea is that rather than check whether a state is safe ahead-of-time, we can dynamically check whether we can maintain safety if we use the LEC, and only use the LEC if we can do so.…”

Section: Introductionmentioning

confidence: 99%

Robust Model Predictive Shielding for Safe Reinforcement Learning with Stochastic Dynamics

Bastani

2019

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes a framework for safe reinforcement learning that can handle stochastic nonlinear dynamical systems. We focus on the setting where the nominal dynamics are known, and are subject to additive stochastic disturbances with known distribution. Our goal is to ensure the safety of a control policy trained using reinforcement learning, e.g., in a simulated environment. We build on the idea of model predictive shielding (MPS), where a backup controller is used to override the learned policy as needed to ensure safety. The key challenge is how to compute a backup policy in the context of stochastic dynamics. We propose to use a tube-based robust NMPC controller as the backup controller. We estimate the tubes using sampled trajectories, leveraging ideas from statistical learning theory to obtain high-probability guarantees. We empirically demonstrate that our approach can ensure safety in stochastic systems, including cart-pole and a non-holonomic particle with random obstacles.

show abstract

Safe Reinforcement Learning via Statistical Model Predictive Shielding

Cited by 28 publications

References 31 publications

Safe Reinforcement Learning for Legged Locomotion

Safe Reinforcement Learning for Legged Locomotion

PAC Confidence Predictions for Deep Neural Network Classifiers

Robust Model Predictive Shielding for Safe Reinforcement Learning with Stochastic Dynamics

Contact Info

Product

Resources

About