2022
DOI: 10.48550/arxiv.2204.09424
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Abstract: Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…Classification. Because of this analysis operation based on sequence features, it is consistent with reinforcement learning applied to sequence decision features 8 . This kind of problem that requires step-by-step operations on sequence data can be solved using policy gradients in reinforcement learning.…”
Section: Introductionmentioning
confidence: 74%
“…Classification. Because of this analysis operation based on sequence features, it is consistent with reinforcement learning applied to sequence decision features 8 . This kind of problem that requires step-by-step operations on sequence data can be solved using policy gradients in reinforcement learning.…”
Section: Introductionmentioning
confidence: 74%
“…Safe RL. Constrained optimization techniques are usually adopted to solve safe RL problems (Garcıa & Fernández, 2015;Sootla et al, 2022;Yang et al, 2021;Flet-Berliac & Basu, 2022). Lagrangian-based methods use a multiplier to penalize constraint violations (Chow et al, 2017;Tessler et al, 2018;Stooke et al, 2020;Chen et al, 2021b).…”
Section: Related Workmentioning
confidence: 99%