Safe Reinforcement Learning via Shielding

Alshiekh, Mohammed; Bloem, Roderick; Ehlers, Rüdiger; Könighofer, Bettina; Niekum, Scott; Topcu, Ufuk

doi:10.1609/aaai.v32i1.11797

Cited by 315 publications

(107 citation statements)

References 12 publications

(15 reference statements)

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…A promising avenue is to extend single-agent methods for safe reinforcement learning (e.g. shielding [90]) and offline evolution with safe online adaptation (e.g. map-based constrained Bayesian optimisation [83]) to the decentralised multi-agent setting.…”

Section: Discussionmentioning

confidence: 99%

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

2022

View full text Add to dashboard Cite

Purpose of Review This paper reviews opportunities and challenges for decentralised control, change-detection, and learning in the context of resilient robot teams. Recent Findings Exogenous fault-detection methods can provide a generic detection or a specific diagnosis with a recovery solution. Robot teams can perform active and distributed sensing for detecting changes in the environment, including identifying and tracking dynamic anomalies, as well as collaboratively mapping dynamic environments. Resilient methods for decentralised control have been developed in learning perception-action-communication loops, multi-agent reinforcement learning, embodied evolution, offline evolution with online adaptation, explicit task allocation, and stigmergy in swarm robotics. Summary Remaining challenges for resilient robot teams are integrating change-detection and trial-and-error learning methods, obtaining reliable performance evaluations under constrained evaluation time, improving the safety of resilient robot teams, theoretical results demonstrating rapid adaptation to given environmental perturbations, and designing realistic and compelling case studies.

show abstract

Section: Discussionmentioning

confidence: 99%

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

2022

View full text Add to dashboard Cite

show abstract

“…Thus, safe RL has great potential to conclude a model-free policy that could provide exploration for unknown system dynamics (Garcıa & Fernández, 2015). Combined with classic control techniques like formal methods (Alshiekh et al, 2018) and Model Predictive Control (MPC) (Zanon & Gros, 2020), safe policy is synthesised with theoretical guarantees. To maintain the property of forward invariance in a safe set, CBF provides a manner to obtain provably and scalable collision-free behaviours.…”

Section: Autonomous Drivingmentioning

confidence: 99%

Control for smart systems: Challenges and trends in smart cities

Jia

Panetto

Macchi

et al. 2022

Annual Reviews in Control

View full text Add to dashboard Cite

“…One approach is to integrate adaptive control with standard machine learning methods, such as NN [63], GP [25] and DNN [17,33,37]. The safety properties are usually considered on the whole system with some parts being learned in MPC problems [9,36], shielding [2], Control Barrier Functions (CBFs) [14], Hamiltonian analysis [5]. These approaches can guarantee safety for tasks such as stabilization [34,67] and tracking [22,49].…”

Section: A Related Work 1) Safety and Learningmentioning

confidence: 99%

“…Safe exploration [47] and safe optimization [60] of MDPs under unknown or selected cost functions can be formulated. Constrained MDPs are also common in RL tasks with Lagrangian methods [18] and generalized Lyapunov/barrier functions [14,19,21] and shielding [2]. Nonetheless much of the work remains confined to rather naive simulated tasks, such as moving a 2D agent on a grid map.…”

Section: A Related Work 1) Safety and Learningmentioning

confidence: 99%

“…4b, there is a large oscillation of the robot base lateral direction when Cassie is walking. After applying a lowpass filter with a cut-off frequency of 5 Hz on the measured lateral velocity, the prediction accuracy on the flitted signal is 83.03% using the same fitted model via (2). The dynamics of walking height f z (x z , u z ) and the dynamics of turning yaw velocity f φ(x φ, u φ) are fitted as:…”

Section: A Identified Linear Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Li¹,

Zeng²,

Thirugnanam³

et al. 2022

Robotics: Science and Systems XVIII

View full text Add to dashboard Cite

Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller.

show abstract

Safe Reinforcement Learning via Shielding

Cited by 315 publications

References 12 publications

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

Resilient Robot Teams: a Review Integrating Decentralised Control, Change-Detection, and Learning

Control for smart systems: Challenges and trends in smart cities

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Contact Info

Product

Resources

About