2022 American Control Conference (ACC) 2022
DOI: 10.23919/acc53348.2022.9867805
|View full text |Cite
|
Sign up to set email alerts
|

Convergence and optimality of policy gradient primal-dual method for constrained Markov decision processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
63
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(77 citation statements)
references
References 6 publications
1
63
0
Order By: Relevance
“…As a result, these methods aim to bound the ℓ 2 distance to the feasible set and the bounds scale with the ℓ 2 norm of the reward vector. This deviates from the more common, and perhaps more natural, formulation for constrained MDPs studied in other works [Efroni et al, 2020, Brantley et al, 2020, Ding et al, 2021. Here, each component of the loss vector is within a given range (e.g.…”
Section: Reinforcement Learning In Constrained Mdpsmentioning
confidence: 83%
“…As a result, these methods aim to bound the ℓ 2 distance to the feasible set and the bounds scale with the ℓ 2 norm of the reward vector. This deviates from the more common, and perhaps more natural, formulation for constrained MDPs studied in other works [Efroni et al, 2020, Brantley et al, 2020, Ding et al, 2021. Here, each component of the loss vector is within a given range (e.g.…”
Section: Reinforcement Learning In Constrained Mdpsmentioning
confidence: 83%
“…Non-asymptotic analysis of (natural) policy gradient methods. Moving beyond tabular MDPs, finite-time convergence guarantees of PG / NPG methods and their variants have recently been studied for control problems (e.g., [18,19,44,58]), regularized MDPs (e.g., [11,24,54]), constrained MDPs (e.g., [15,50]), robust MDPs (e.g., [29,60]), MDPs with function approximation (e.g., [1,2,10,25,30,45]), Markov games (e.g., [13,14,46,49,61]), and their use in actor-critic methods (e.g., [3,12,48,51]).…”
Section: Other Related Workmentioning
confidence: 99%
“…RL with constraints: First, constraints that require some expected cumulative costs over all steps to be bounded have been widely studied in safe RL [19,20,21,8,22,23,24,9,25,26,10,27,28,11,29,30]. Second, many other work, e.g., [31] and [32], studied budget constraints that will halt the learning process whenever the budget has run out of.…”
Section: Related Workmentioning
confidence: 99%