Safe Reinforcement Learning Using Robust MPC

Zanon, Mario; Gros, Sébastien

doi:10.1109/tac.2020.3024161

Cited by 160 publications

(99 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For uncertain dynamical systems, methods based on learning a model of unknown system dynamics [13] or of environmental constraints [14] are proposed to ensure safety during the learning. For instance, by predicting the system behavior in the worst case, robust model predictive control [15] is able to provide safety and stability guarantees to reinforcement learning algorithms if the error in the learned model is bounded. Besides, [16] introduces an action governor to correct the applied action when the system is predicted to be unsafe.…”

Section: A Related Workmentioning

confidence: 99%

Learning a Low-Dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

Zhou

Oguz

Leibold

et al. 2023

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

For the safe application of reinforcement learning algorithms to high-dimensional nonlinear dynamical systems, a simplified system model is used to formulate a safe reinforcement learning (SRL) framework. Based on the simplified system model, a low-dimensional representation of the safe region is identified and used to provide safety estimates for learning algorithms. However, finding a satisfying simplified system model for complex dynamical systems usually requires a considerable amount of effort. To overcome this limitation, we propose a general data-driven approach that is able to efficiently learn a low-dimensional representation of the safe region. By employing an online adaptation method, the low-dimensional representation is updated using the feedback data to obtain more accurate safety estimates. The performance of the proposed approach for identifying the low-dimensional representation of the safe region is illustrated using the example of a quadcopter. The results demonstrate a more reliable and representative low-dimensional representation of the safe region compared with previous works, which extends the applicability of the SRL framework. Index Terms-Data-driven model order reduction, deep learning in robotics and automation, learning and adaptive systems, safe reinforcement learning (SRL).

show abstract

Section: A Related Workmentioning

confidence: 99%

Learning a Low-Dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

Zhou

Oguz

Leibold

et al. 2023

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…Model inaccuracies can lead to sub-optimal MPC solution quality; [20] proposes to learn a policy by choosing between two actions with the best expected reward at each timestep: one from modelfree RL and one from a model-based trajectory optimizer. Alternatively, RL can be used to optimize the weights of an MPC-based Q-function approximator or to update a robust MPC parametrization [21]. When the model is completely unknown, [22] shows a way of learning a dynamics model to be used in MPC.…”

Section: A Related Workmentioning

confidence: 99%

Where to go Next: Learning a Subgoal Recommendation Policy for Navigation in Dynamic Environments

Brito

Everett

How

et al. 2021

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios. This paper proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner. The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kinodynamic and collision avoidance constraints. Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios. Index Terms-Deep Reinforcement Learning, Motion and Path Planning in Dynamic Environments or for Multi-robot Systems. I. INTRODUCTIONAutonomous robot navigation in crowds remains difficult due to the interaction effects among navigating agents. Unlike multi-robot environments, robots operating among pedestrians require decentralized algorithms that can handle a mixture of other agents' behaviors without depending on explicit communication between agents.Several state-of-the-art collision avoidance methods employ model-predictive control (MPC) with online optimization to

show abstract

“…In [6], a Q-function is learnt for iterative tasks by approximating it at discrete points in the state-space as the infinite sum of stage costs. In [7] and [8], the MPC objective is parameterised by the matrices describing the quadratic stage and terminal costs, these are optimised to learn the MPC objective giving the best closed-loop performance. A gradient descent based approach is used to optimise the parameters in [7] while [8] computes solutions to the KKT optimality conditions.…”

Section: Introductionmentioning

confidence: 99%

“…In [7] and [8], the MPC objective is parameterised by the matrices describing the quadratic stage and terminal costs, these are optimised to learn the MPC objective giving the best closed-loop performance. A gradient descent based approach is used to optimise the parameters in [7] while [8] computes solutions to the KKT optimality conditions. Both works use data from expert demonstrations.…”

Section: Introductionmentioning

confidence: 99%

Learning Q-Function Approximations for Hybrid Control Problems

Menta

Warrington

Lygeros

et al. 2022

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

The main challenge in controlling hybrid systems arises from having to consider an exponential number of sequences of future modes to make good long-term decisions. Model predictive control (MPC) computes a control action through a finite-horizon optimisation problem. A key ingredient in this problem is a terminal cost, to account for the system's evolution beyond the chosen horizon. A good terminal cost can reduce the horizon length required for good control action and is often tuned empirically by observing performance. We build on the idea of using N-step Q-functions (Q (N) ) in the MPC objective to avoid having to choose a terminal cost. We present a formulation incorporating the system dynamics and constraints to approximate the optimal Q (N) -function and algorithms to train the approximation parameters through an exploration of the state space. We test the control policy derived from the trained approximations on two benchmark problems through simulations and observe that our algorithms are able to learn good Q (N) -approximations for hybrid systems with dimensions of practical relevance based on a relatively small data-set. We compare our controller's performance against that of Hybrid MPC in terms of computation time and closed-loop costs.

show abstract

Safe Reinforcement Learning Using Robust MPC

Cited by 160 publications

References 31 publications

Learning a Low-Dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

Learning a Low-Dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

Where to go Next: Learning a Subgoal Recommendation Policy for Navigation in Dynamic Environments

Learning Q-Function Approximations for Hybrid Control Problems

Contact Info

Product

Resources

About