2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC) 2020
DOI: 10.1109/dasc50938.2020.9256446
|View full text |Cite
|
Sign up to set email alerts
|

Runtime Safety Assurance Using Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…al. in [14] applied similar rewarding scheme for an autopilot system of the aircraft which punished the RL agent whenever the emergency controller had to be deployed. They showed choosing higher values for λ will encourage more conservative behavior, whereas faster policies with more interruptions can be expected for lower values of λ.…”
Section: B Minimizing Safety Interference With Reinforcement Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…al. in [14] applied similar rewarding scheme for an autopilot system of the aircraft which punished the RL agent whenever the emergency controller had to be deployed. They showed choosing higher values for λ will encourage more conservative behavior, whereas faster policies with more interruptions can be expected for lower values of λ.…”
Section: B Minimizing Safety Interference With Reinforcement Learningmentioning
confidence: 99%
“…Another approach to address safety for RL policies is to utilize a safety layer which filters out unsafe actions as proposed in [1], [13], [14]. This safety layer must be easily verifiable without any black boxes such as deep neural networks inside its architecture which are hard to verify [15].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Another approach to address safety for RL policies is to utilize a safety layer which filters out unsafe actions as proposed in [1], [13], [14]. This safety layer must be easily verifiable without any black boxes such as deep neural networks inside its architecture which are hard to verify [15].…”
Section: Introductionmentioning
confidence: 99%
“…We introduce safe distributional RL which considers maximum uncertainty and capability inside safety verification layer. During training the RL agent is punished for every safety intervention similar to [13], [14]. However, the main difference is it learns distributions instead of expected values for each action return which helps to provide risk-aware policies that are able to adapt their conservativity according to the existing uncertainty in the environment.…”
Section: Introductionmentioning
confidence: 99%