Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.077
|View full text |Cite
|
Sign up to set email alerts
|

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Abstract: Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in the use of reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the L… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(7 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…Chow et al [8] use Lagrangian methods to transform the constrained optimization into an unconstrained one over the primal variable (policy) and the dual variable (penalty coefficient). A recent line of works building on reachability analysis argues that optimizing the sum of rewards and penalties is not an accurate encoding of safety [10,11]. Instead, they propose to use RL to find a value function, which is an approximate solution of a Hamilton-Jacobi partial differential equation [22,23].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Chow et al [8] use Lagrangian methods to transform the constrained optimization into an unconstrained one over the primal variable (policy) and the dual variable (penalty coefficient). A recent line of works building on reachability analysis argues that optimizing the sum of rewards and penalties is not an accurate encoding of safety [10,11]. Instead, they propose to use RL to find a value function, which is an approximate solution of a Hamilton-Jacobi partial differential equation [22,23].…”
Section: Related Workmentioning
confidence: 99%
“…Instead, they propose to use RL to find a value function, which is an approximate solution of a Hamilton-Jacobi partial differential equation [22,23]. Hsu et al [11] shows that reachability-based RL has fewer safety violations during deployment compared to standard RL.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, directly learning HJ value functions by leveraging neural networks for supervised approximate DP does not provide hard guarantees [18]. Another approach, which leverages the similarity of HJ reachability and reinforcement learning does not necessarily encode safety [19].…”
Section: Introductionmentioning
confidence: 99%