Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.026
|View full text |Cite
|
Sign up to set email alerts
|

Safe Reinforcement Learning via Statistical Model Predictive Shielding

Abstract: Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety-e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy-it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the back… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(25 citation statements)
references
References 31 publications
(73 reference statements)
0
25
0
Order By: Relevance
“…Note that when w = 0, it is equivalent to the removal of the reachability criteria in our algorithm, and similar to a shielding-based MPC approach in [43]. In addition, w = 0 does not mean our method is similar to recovery RL since it can trigger π safe before reaching the safety boundary.…”
Section: Catwalkmentioning
confidence: 99%
“…Note that when w = 0, it is equivalent to the removal of the reachability criteria in our algorithm, and similar to a shielding-based MPC approach in [43]. In addition, w = 0 does not mean our method is similar to recovery RL since it can trigger π safe before reaching the safety boundary.…”
Section: Catwalkmentioning
confidence: 99%
“…Then, the shield policy uses π if x ∈ X rec , and π 0 otherwise (Bastani, 2019). This policy guarantees safety as long as an initial state is recoverable-i.e., x 0 ∈ X rec .…”
Section: Application To Safe Planningmentioning
confidence: 99%
“…There has also been work on safe learning-based control (Akametalu et al, 2014;Fisac et al, 2019;Bastani, 2019;Wabersich & Zeilinger, 2018;Alshiekh et al, 2018); however, these approaches are not applicable to perception-based control. The most closely related work is Dean et al (2019), which handles perception, but they are restricted to known linear dynamics.…”
Section: Introductionmentioning
confidence: 99%
“…Many methods in this area leverage optimization tools to prove that a learned neural network policy satisfies a given safety constraint [23], [24], [25], [26], [18], [19], [22]. A related approach is shielding, which verifies a backup controller, and then overrides the LEC using the backup controller when it can no longer ensure 1 Shuo Li is with the GRASP Lab, University of Pennsylvania, USA lishuo1@seas.upenn.edu 2 Osbert Bastani is with the Department of Computer and Information Science, University of Pennsylvania, USA obastani@seas.upenn.edu that using the LEC is safe [16], [17], [20], [27]. While these methods provide strong mathematical guarantees, they suffer from a number of shortcomings.…”
Section: Introductionmentioning
confidence: 99%
“…We build on a recently proposed idea called model predictive shielding (MPS), which has been used to ensure safety of learned control policies [28], [27], including extensions to the multi-agent setting [29]. The basic idea is that rather than check whether a state is safe ahead-of-time, we can dynamically check whether we can maintain safety if we use the LEC, and only use the LEC if we can do so.…”
Section: Introductionmentioning
confidence: 99%