2021
DOI: 10.1109/tac.2020.3024161
|View full text |Cite
|
Sign up to set email alerts
|

Safe Reinforcement Learning Using Robust MPC

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
92
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 160 publications
(99 citation statements)
references
References 31 publications
0
92
0
Order By: Relevance
“…For uncertain dynamical systems, methods based on learning a model of unknown system dynamics [13] or of environmental constraints [14] are proposed to ensure safety during the learning. For instance, by predicting the system behavior in the worst case, robust model predictive control [15] is able to provide safety and stability guarantees to reinforcement learning algorithms if the error in the learned model is bounded. Besides, [16] introduces an action governor to correct the applied action when the system is predicted to be unsafe.…”
Section: A Related Workmentioning
confidence: 99%
“…For uncertain dynamical systems, methods based on learning a model of unknown system dynamics [13] or of environmental constraints [14] are proposed to ensure safety during the learning. For instance, by predicting the system behavior in the worst case, robust model predictive control [15] is able to provide safety and stability guarantees to reinforcement learning algorithms if the error in the learned model is bounded. Besides, [16] introduces an action governor to correct the applied action when the system is predicted to be unsafe.…”
Section: A Related Workmentioning
confidence: 99%
“…Model inaccuracies can lead to sub-optimal MPC solution quality; [20] proposes to learn a policy by choosing between two actions with the best expected reward at each timestep: one from modelfree RL and one from a model-based trajectory optimizer. Alternatively, RL can be used to optimize the weights of an MPC-based Q-function approximator or to update a robust MPC parametrization [21]. When the model is completely unknown, [22] shows a way of learning a dynamics model to be used in MPC.…”
Section: A Related Workmentioning
confidence: 99%
“…In [6], a Q-function is learnt for iterative tasks by approximating it at discrete points in the state-space as the infinite sum of stage costs. In [7] and [8], the MPC objective is parameterised by the matrices describing the quadratic stage and terminal costs, these are optimised to learn the MPC objective giving the best closed-loop performance. A gradient descent based approach is used to optimise the parameters in [7] while [8] computes solutions to the KKT optimality conditions.…”
Section: Introductionmentioning
confidence: 99%
“…In [7] and [8], the MPC objective is parameterised by the matrices describing the quadratic stage and terminal costs, these are optimised to learn the MPC objective giving the best closed-loop performance. A gradient descent based approach is used to optimise the parameters in [7] while [8] computes solutions to the KKT optimality conditions. Both works use data from expert demonstrations.…”
Section: Introductionmentioning
confidence: 99%