2021 IEEE Intelligent Vehicles Symposium (IV) 2021
DOI: 10.1109/iv48863.2021.9575205
|View full text |Cite
|
Sign up to set email alerts
|

Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…For our SAC integration, the gradient of the multiplicative value function already naturally results in something similar to a Lagrange multiplier. Commonly, the safety constraint is formulated as a constraint budget over the expected safety cost, e.g., [10], [20], [21], [23], [12], [11]. However, only considering the expected safety at each time step can cause the realized cost to exceed the constraint budget [22].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For our SAC integration, the gradient of the multiplicative value function already naturally results in something similar to a Lagrange multiplier. Commonly, the safety constraint is formulated as a constraint budget over the expected safety cost, e.g., [10], [20], [21], [23], [12], [11]. However, only considering the expected safety at each time step can cause the realized cost to exceed the constraint budget [22].…”
Section: Related Workmentioning
confidence: 99%
“…We tackle this issue by using reachability analysis and imposing zero constraint violation probability in our experiments. Another line of research improves the stability of the learning process by using derivatives and integrals of the constraint function yielding a PID approach [12], [23]. Similarly, our multiplicative value function improves stability by simplifying the learning task.…”
Section: Related Workmentioning
confidence: 99%
“…As an alternative to risk metrics, previous works have directly bounded the probability of a system violating a constraint, which are known as chance constraints. They have been used for MDPs [23], partially observable Markov decision processes (POMDPs) [24] and RL [25]. However, these approaches do not account for how robustly a controller satisfies its constraints as we do in this paper.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In this sense, developing a robust motion planning methodology is meaningful for stochastic multi-agent systems, such that the agents exhibits good robustness properties in the presence of other moving agents and static obstacles. One effective method is to characterize the uncertainties in a probabilistic manner and find the optimal sequence of control inputs subject to chance constraints [20], [21]. Thus, a risk bound can be stipulated in the chance constraints.…”
Section: Introductionmentioning
confidence: 99%