2022
DOI: 10.48550/arxiv.2201.08355
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Abstract: Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…Our method can thus be used in a 'plug-and-play' manner to improve the safety of an existing policy. One recent work [47] also improves safety of the policies trained using PAC-Bayes theory and detects failure by learning a safety value function using Hamilton-Jacobi reachability-based reinforcement learning, but it does not explicitly provide guarantees on failure prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Our method can thus be used in a 'plug-and-play' manner to improve the safety of an existing policy. One recent work [47] also improves safety of the policies trained using PAC-Bayes theory and detects failure by learning a safety value function using Hamilton-Jacobi reachability-based reinforcement learning, but it does not explicitly provide guarantees on failure prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Since autonomous vehicles require interactions with other road users and are safetycritical by nature, there has been a lot of effort, from regulators, industry, and researchers alike, in ensuring safe AV operations [7,8,9,10,11,12,13,14,15,16,17] (see [18] for a review). A common approach is to compute "inevitable" collision sets (ICS), e.g., via Hamilton-Jacobi reachability computation [19], with selected assumptions on other agents' behaviors, and perform shielding where the AV will flag a situation as unsafe whenever it is close to entering the ICS and execute an appropriate evasive action (e.g., [1,20,21]). A primary challenge is selecting reasonable behavior assumptions for ICS computation to balance tractability, interpretability, and compatibility with real-world driving interactions [22,23].…”
Section: Related Workmentioning
confidence: 99%