Runtime Safety Assurance Using Reinforcement Learning

Lazarus, Christopher; Lopez, James G.; Kochenderfer, Mykel J.

doi:10.1109/dasc50938.2020.9256446

“…al. in [14] applied similar rewarding scheme for an autopilot system of the aircraft which punished the RL agent whenever the emergency controller had to be deployed. They showed choosing higher values for λ will encourage more conservative behavior, whereas faster policies with more interruptions can be expected for lower values of λ.…”

Section: B Minimizing Safety Interference With Reinforcement Learningmentioning

confidence: 99%

“…Another approach to address safety for RL policies is to utilize a safety layer which filters out unsafe actions as proposed in [1], [13], [14]. This safety layer must be easily verifiable without any black boxes such as deep neural networks inside its architecture which are hard to verify [15].…”

Section: Introductionmentioning

confidence: 99%

“…We introduce safe distributional RL which considers maximum uncertainty and capability inside safety verification layer. During training the RL agent is punished for every safety intervention similar to [13], [14]. However, the main difference is it learns distributions instead of expected values for each action return which helps to provide risk-aware policies that are able to adapt their conservativity according to the existing uncertainty in the environment.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Kamran¹,

Engelgeh²,

Busch³

et al. 2021

Preprint

0

View full text Add to dashboard Cite

Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility. Using a proactive safety verification approach, the proposed framework can guarantee that actions generated from RL are fail-safe according to the worst-case assumptions. Concurrently, the policy is encouraged to minimize safety interference and generate more comfortable behavior. We trained and evaluated the proposed approach and baseline policies using a high level simulator with a variety of randomized scenarios including several corner cases which rarely happen in reality but are very crucial. In light of our experiments, the behavior of policies learned using distributional RL can be adaptive at run-time and robust to the environment uncertainty. Quantitatively, the learned distributional RL agent drives in average 8 seconds faster than the normal DQN policy and requires 83% less safety interference compared to the rule-based policy with slightly increasing the average crossing time. We also study sensitivity of the learned policy in environments with higher perception noise and show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.

show abstract

“…Another approach to address safety for RL policies is to utilize a safety layer which filters out unsafe actions as proposed in [1], [13], [14]. This safety layer must be easily verifiable without any black boxes such as deep neural networks inside its architecture which are hard to verify [15].…”

Section: Introductionmentioning

confidence: 99%

“…We introduce safe distributional RL which considers maximum uncertainty and capability inside safety verification layer. During training the RL agent is punished for every safety intervention similar to [13], [14]. However, the main difference is it learns distributions instead of expected values for each action return which helps to provide risk-aware policies that are able to adapt their conservativity according to the existing uncertainty in the environment.…”

Section: Introductionmentioning

confidence: 99%

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Kamran

¹

,

Engelgeh²,

Busch³

et al. 2021

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility. Using a proactive safety verification approach, the proposed framework can guarantee that actions generated from RL are fail-safe according to the worst-case assumptions. Concurrently, the policy is encouraged to minimize safety interference and generate more comfortable behavior. We trained and evaluated the proposed approach and baseline policies using a high level simulator with a variety of randomized scenarios including several corner cases which rarely happen in reality but are very crucial. In light of our experiments, the behavior of policies learned using distributional RL can be adaptive at run-time and robust to the environment uncertainty. Quantitatively, the learned distributional RL agent drives in average 8 seconds faster than the normal DQN policy and requires 83% less safety interference compared to the rule-based policy with slightly increasing the average crossing time. We also study sensitivity of the learned policy in environments with higher perception noise and show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.

show abstract

Deep reinforcement learning with predictive auxiliary task for autonomous train collision avoidance

Plissonneau,

Jourdan,

Trentesaux

et al. 2024

Journal of Rail Transport Planning & Management

0

View full text Add to dashboard Cite

Runtime Safety Assurance Using Reinforcement Learning

Cited by 12 publications

References 11 publications

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Deep reinforcement learning with predictive auxiliary task for autonomous train collision avoidance

Contact Info

Product

Resources

About