2022
DOI: 10.48550/arxiv.2201.05756
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Block Policy Mirror Descent

Abstract: In this paper, we present a new class of policy gradient (PG) methods, namely the block policy mirror descent (BPMD) methods for solving a class of regularized reinforcement learning (RL) problems with (strongly) convex regularizers. Compared to the traditional PG methods with batch update rule, which visit and update the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…Until recently, linear convergence of policy gradient method is established in [13,26] for MDPs without any regularization, by proposing an approximate policy mirror descent method. Policy gradient methods with a block update rule, which evaluate and update the policy for only a subset of states, have been also proposed in [14], with similar convergence properties. With softmax policy parameterization, linear convergence of natural policy gradient method is also established in [12] when using adaptive stepsizes.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Until recently, linear convergence of policy gradient method is established in [13,26] for MDPs without any regularization, by proposing an approximate policy mirror descent method. Policy gradient methods with a block update rule, which evaluate and update the policy for only a subset of states, have been also proposed in [14], with similar convergence properties. With softmax policy parameterization, linear convergence of natural policy gradient method is also established in [12] when using adaptive stepsizes.…”
Section: Introductionmentioning
confidence: 99%
“…(1) Convergence rate: Linear convergence of policy gradient methods has been discussed in prior literature [14,12,26], while existing superlinear convergence result often requires strong algorithm-dependent assumptions [21,12]. It is unclear whether policy gradient methods converge superlinearly in general settings.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…By carefully solving a sequence of entropy-regularized MDPs, with diminishing regularizations and increasing stepsizes, [21] proposes the first linearly-converging PG method for un-regularized MDPs. This is further simplified in [23,27,42], which drops the regularization while retaining the linear convergence. Beyond the optimality gap, convergence of the policy has been studied in [28].…”
Section: Introductionmentioning
confidence: 99%
“…
In this short note, we give the convergence analysis of the policy in recent famous policy mirror descent (PMD) [5,12,8,6,11]. We mainly consider the unregularized setting following [11] with generalized Bregman divergence.
…”
mentioning
confidence: 99%