Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Chen, Baiming; Liu, Zuxin; Zhu, Jiacheng; Xu, Min; Ding, Wenhao; Zhao, Ding

doi:10.48550/arxiv.2101.00531

Cited by 6 publications

(5 citation statements)

References 37 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Safe RL. One type of approaches utilize domain knowledge of the target problem to improve the safety of an RL agent, such as designing a safety filter [21], assuming a sophistic system dynamics model [20,22,23], or incorporating expert interventions [24,25]. Constrained Markov Decision Process (CMDP) is another commonly used framework to model the safe RL problem, which can be solved via many constrained optimization techniques [4].…”

Section: Related Workmentioning

confidence: 99%

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Liu¹,

Guo²,

Cen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not robust and safe against carefully designed observational perturbations. We formally analyze the unique properties of designing effective state adversarial attackers in the safe RL setting. We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL and proposed two new approaches -one maximizes the cost and the other maximizes the reward. One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward. We further propose a more effective adversarial training framework for safe RL and evaluate it via comprehensive experiments 1 . This work sheds light on the inherited connection between observational robustness and safety in RL and provides a pioneer work for future safe RL studies.ADV-PPOL(MR) 525.93±2.

show abstract

Section: Related Workmentioning

confidence: 99%

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Liu¹,

Guo²,

Cen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…SAVED [181] and RCE [109] both use an ensemble of neural networks as the dynamics model to estimate the dynamics prediction uncertainty and solve the constrained optimization problem via a model-predictive-control fashion, where the former formulates chance constraints to ensure safety from a probabilistic perspective, while the latter one considers the worst case unsafe scenario. CASRL [27] further extends previous approaches from stationary environments to non-stationary environments by modeling the non-stationary disturbances as probabilistic latent variables.…”

Section: Uncertainty-aware Methodsmentioning

confidence: 98%

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Xu¹,

Liu²,

Huang³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including robustly handling uncertainties, satisfying safety constraints to avoid catastrophic failures, and generalizing to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning. CCS Concepts: • Computing methodologies → Reinforcement learning; Markov decision processes; • Security and privacy → Social aspects of security and privacy; • Computer systems organization → Robotics; • Hardware → Safety critical systems.

show abstract

“…Learning to adapt. Meta-RL has recently been proposed to achieve fast adaptation of a pre-trained policy in the presence of dynamic variations [18]- [23]. Despite impressive performance mainly in terms of fast adaptation demonstrated by these methods, the intermediate policies learned during the adaptation phase will most likely still fail.…”

Section: A Related Workmentioning

confidence: 99%

Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control

Cheng¹,

Zhao²,

Gandhi³

et al. 2021

Preprint

View full text Add to dashboard Cite

A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with L1 adaptive control (L1AC). Leveraging the capability of an L1AC law in fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.

show abstract

Context-Aware Safe Reinforcement Learning for Non-Stationary Environments

Cited by 6 publications

References 37 publications

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Robustifying Reinforcement Learning Policies with $\mathcal{L}_1$ Adaptive Control

Contact Info

Product

Resources

About