2021
DOI: 10.48550/arxiv.2104.04901
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Global Convergence of Policy Gradient Primal-dual Methods for Risk-constrained LQRs

Abstract: While the techniques in optimal control theory are often model-based, the policy optimization (PO) approach can directly optimize the performance metric of interest without explicit dynamical models, and is an essential approach for reinforcement learning problems. However, it usually leads to a non-convex optimization problem in most cases, where there is little theoretical understanding on its performance. In this paper, we focus on the risk-constrained Linear Quadratic Regulator (LQR) problem with noisy inp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 36 publications
(71 reference statements)
0
3
0
Order By: Relevance
“…Remark 2. It is worth mentioning that PO has also been investigated in control problems with robustness and risk-sensitivity concerns other than the LEQG/H ∞ settings discussed in this survey (see 50,52,55,115,[124][125][126][127][128][129].…”
Section: For Anymentioning
confidence: 99%
“…Remark 2. It is worth mentioning that PO has also been investigated in control problems with robustness and risk-sensitivity concerns other than the LEQG/H ∞ settings discussed in this survey (see 50,52,55,115,[124][125][126][127][128][129].…”
Section: For Anymentioning
confidence: 99%
“…The results of [43] were extended to the infinite horizon case in [44]. The performance of the policy gradient algorithm in the case of risk-constrained Linear Quadratic Regulators was also studied in [45]. Predictive variance constraints have also been used as a measure of risk in portfolio optimization [46]; different from our paper, the noise is limited to Gaussian distributions and the variance is with respect to linear stage costs.…”
Section: Related Workmentioning
confidence: 99%
“…This point of view was initiated when the LQR cost was shown to be gradient dominant [19], facilitating a global convergence guarantee of first order methods for this problem-despite of its non-convexity. Since then, PO using first order methods has been investigated for variants of LQR problem, such as OLQR [20], model-free setup [21], and riskconstrained LQR [22] just to name a few. The gradient dominant property, however, is only known to be valid with respect to the global optimum of the unconstrained case, and is not necessarily expected for the general constrained LQR problems.…”
Section: Introductionmentioning
confidence: 99%