2021
DOI: 10.48550/arxiv.2101.00531
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Abstract: Safety is a critical concern when deploying reinforcement learning agents for realistic tasks. Recently, safe reinforcement learning algorithms have been developed to optimize the agent's performance while avoiding violations of safety constraints. However, few studies have addressed the non-stationary disturbances in the environments, which may cause catastrophic outcomes. In this paper, we propose the context-aware safe reinforcement learning (CASRL) method, a meta-learning framework to realize safe adaptati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 37 publications
(52 reference statements)
0
5
0
Order By: Relevance
“…Safe RL. One type of approaches utilize domain knowledge of the target problem to improve the safety of an RL agent, such as designing a safety filter [21], assuming a sophistic system dynamics model [20,22,23], or incorporating expert interventions [24,25]. Constrained Markov Decision Process (CMDP) is another commonly used framework to model the safe RL problem, which can be solved via many constrained optimization techniques [4].…”
Section: Related Workmentioning
confidence: 99%
“…Safe RL. One type of approaches utilize domain knowledge of the target problem to improve the safety of an RL agent, such as designing a safety filter [21], assuming a sophistic system dynamics model [20,22,23], or incorporating expert interventions [24,25]. Constrained Markov Decision Process (CMDP) is another commonly used framework to model the safe RL problem, which can be solved via many constrained optimization techniques [4].…”
Section: Related Workmentioning
confidence: 99%
“…SAVED [181] and RCE [109] both use an ensemble of neural networks as the dynamics model to estimate the dynamics prediction uncertainty and solve the constrained optimization problem via a model-predictive-control fashion, where the former formulates chance constraints to ensure safety from a probabilistic perspective, while the latter one considers the worst case unsafe scenario. CASRL [27] further extends previous approaches from stationary environments to non-stationary environments by modeling the non-stationary disturbances as probabilistic latent variables.…”
Section: Uncertainty-aware Methodsmentioning
confidence: 98%
“…Learning to adapt. Meta-RL has recently been proposed to achieve fast adaptation of a pre-trained policy in the presence of dynamic variations [18]- [23]. Despite impressive performance mainly in terms of fast adaptation demonstrated by these methods, the intermediate policies learned during the adaptation phase will most likely still fail.…”
Section: A Related Workmentioning
confidence: 99%