2022
DOI: 10.48550/arxiv.2205.10715
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

Abstract: We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action visitation distribution. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
(52 reference statements)
0
1
0
Order By: Relevance
“…We notice that recent last-iterate convergence result for convex-concave saddle-point problems [24] is also applicable to Problem (17), which provides the optimal rate without problem-dependent constants. It is worth mentioning that direct application of such last-iterate convergence results in convex minimax optimization to constrained MDPs with general utilities [9,111] and convex MDPs [120,116] in occupancy-measure space is also straightforward. We omit these exercises in this paper, and focus on the design and analysis of algorithms in policy space.…”
Section: B4 Constrained Mdps In Occupancy-measure Spacementioning
confidence: 99%
“…We notice that recent last-iterate convergence result for convex-concave saddle-point problems [24] is also applicable to Problem (17), which provides the optimal rate without problem-dependent constants. It is worth mentioning that direct application of such last-iterate convergence results in convex minimax optimization to constrained MDPs with general utilities [9,111] and convex MDPs [120,116] in occupancy-measure space is also straightforward. We omit these exercises in this paper, and focus on the design and analysis of algorithms in policy space.…”
Section: B4 Constrained Mdps In Occupancy-measure Spacementioning
confidence: 99%