2019
DOI: 10.48550/arxiv.1906.01786
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Global Optimality Guarantees For Policy Gradient Methods

Abstract: Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by classical techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to local minima. This work identifies structural properties -shared by fin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
104
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 67 publications
(108 citation statements)
references
References 18 publications
(25 reference statements)
4
104
0
Order By: Relevance
“…For more examples of sample complexity analysis for convergence to a stationary point, see for example [162,225,226,183,222]. The global optimality of stationary points was studied in [23] where they identified certain situations under which the policy gradient objective function has no sub-optimal stationary points despite being non-convex.…”
Section: Discussionmentioning
confidence: 99%
“…For more examples of sample complexity analysis for convergence to a stationary point, see for example [162,225,226,183,222]. The global optimality of stationary points was studied in [23] where they identified certain situations under which the policy gradient objective function has no sub-optimal stationary points despite being non-convex.…”
Section: Discussionmentioning
confidence: 99%
“…In this work we are interested in estimating ∇H(d θ ) because it is essential for estimating ∇ρ(θ) [cf. (9)]. It is important to note, however, that Theorem 2 and Corollary 1 are of independent interest.…”
Section: Entropy and Oir Policy Gradient Theoremsmentioning
confidence: 99%
“…Fix a policy parameter iterate θ t at timestep t. The gradient ∇ρ(θ t ) [cf. (9)] with respect to the policy parameters θ of the OIR ρ(θ) [cf. (6)] evaluated at θ = θ t satisfies…”
Section: Entropy and Oir Policy Gradient Theoremsmentioning
confidence: 99%
See 1 more Smart Citation
“…To facilitate the understanding of theoretical aspects of policy gradient methods, canonical control problems of linear time-invariant (LTI) systems have been commonly used as benchmarks [8]- [12]. In particular, the linear quadratic regulator (LQR), one of the most fundamental optimal control problems, has recently regained significant research interest [8]- [11].…”
Section: Introductionmentioning
confidence: 99%