2021
DOI: 10.48550/arxiv.2102.08539
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning

Abstract: Safety is essential for reinforcement learning (RL) applied in real-world tasks like autonomous driving. Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements in real-world environment with uncertainty. Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints. In this paper, we address these shortcomings by proposing a separat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 14 publications
(26 reference statements)
0
2
0
Order By: Relevance
“…However, the framework is limited to linear systems with additive uncertainty. Recently, in [24] the authors present an approach to address high probability constraint satisfaction based on the augmented lagrangian. However, the penalty term presented does not provide information about the quality of control selection (i.e.…”
Section: Safe Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the framework is limited to linear systems with additive uncertainty. Recently, in [24] the authors present an approach to address high probability constraint satisfaction based on the augmented lagrangian. However, the penalty term presented does not provide information about the quality of control selection (i.e.…”
Section: Safe Reinforcement Learningmentioning
confidence: 99%
“…A number of RL-based methodologies have been proposed to ensure operational constraints are satisfied with high probability [26,25,24]. Other works have been proposed to consider the processmodel mismatch that exists when learning an RL policy offline [10,13,12].…”
Section: Contributionmentioning
confidence: 99%