Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Umenberger, Jack; Simchowitz, Max; Perdomo, Juan C.; Zhang, Kaiqing; Tedrake, Russ

doi:10.48550/arxiv.2202.11659

Cited by 4 publications

(15 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to (23), the proof idea of using the change of variables (18) can be applied to other output feedback control problems to establish connectivity of their strict sublevel sets. For example, we can consider an H 2 formulation of the LQG control [16] as follows…”

Section: Revisit Sublevel Sets In Lqg and H ∞ Controlmentioning

confidence: 99%

See 1 more Smart Citation

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

Hu¹,

Zheng²

2022

Preprint

View full text Add to dashboard Cite

This paper considers the optimization landscape of linear dynamic output feedback control with H∞ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional H∞ robustness constraint. We show that this H∞-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in H∞ control to establish a subjective mapping from a set with a convex projection to the H∞-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal H∞ control. Our results bring positive news for gradient-based policy search on robust control problems.

show abstract

Section: Revisit Sublevel Sets In Lqg and H ∞ Controlmentioning

confidence: 99%

“…The above condition is not convex in K and P . However, we can use the same change of variables as (18) in the main text. A controller K ∈ L γ can be constructed if ∃(X, Y, Â, B, Ĉ, Γ) such that the following LMI holds 3 ,…”

Section: Appendixmentioning

confidence: 99%

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

Hu¹,

Zheng²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For real-world control applications, however, we may only have access to partial output measurements. In the output feedback case, the theoretical results for direct policy search are much fewer and far less complete [14]- [18]. It remains unclear whether model-free policy gradient methods can be modified to yield global convergence guarantees.…”

Section: Introductionmentioning

confidence: 99%

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

Zheng

2023

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

This paper considers the optimization landscape of linear dynamic output feedback control with H∞ robustness constraints. We consider the feasible set of all the stabilizing full-order dynamical controllers that satisfy an additional H∞ robustness constraint. We show that this H∞-constrained set has at most two path-connected components that are diffeomorphic under a mapping defined by a similarity transformation. Our proof technique utilizes a classical change of variables in H∞ control to establish a surjective mapping from a set with a convex projection to the H∞-constrained set. This proof idea can also be used to establish the same topological properties of strict sublevel sets of linear quadratic Gaussian (LQG) control and optimal H∞ control. Our results bring positive news for gradientbased policy search on robust control problems.

show abstract

“…The Hessian of J 2 (K) at the optimal controller K ⋆ == 2916 1.0517 −1.0517 −1.24921.0208 1.2916 1.5084 −1.2492 1.0889 −0.8980 0.8980 1.0488 −0.8685 −1.0889 −1.2492 1.0488 which is positive semidefinite and has eigenvalues λ 1 = 8.1111 × 10 5 , λ 2 = 6 133.9, λ 3 = 131.2, λ 4 = 6.36, λ 5= • • • = λ 8 = 0.We further compute the matrices in(16) as follows solutions to Lyapunov equations (9a) and (9b) are can compute( CX op + V BT K )(sI − A T cl ) −1 Y op B = −12.5s3 − 604.2s 2 − 1712s − 566.7 s 4 + 6s 3 + 11s 2 + 6s + 1 ,…”

mentioning

confidence: 99%

“…Therefore, according to [20, Theorem 4.2], the zero controller K = 0 0 0 Λ ∈ C 2 with any stable Λ ∈ R 2×2 is a stationary point. We compute the matrices in (16) as . Then, we can compute…”

mentioning

confidence: 99%

Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control

Zheng¹,

Sun²,

Fazel³

et al. 2022

Preprint

View full text Add to dashboard Cite

First order policy optimization has been widely used in reinforcement learning. It guarantees to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However, the performance of policy optimization remains unclear for the linear quadratic Gaussian (LQG) control where the LQG cost has spurious suboptimal stationary points. In this paper, we introduce a novel perturbed policy gradient (PGD) method to escape a large class of bad stationary points (including high-order saddles). In particular, based on the specific structure of LQG, we introduce a novel reparameterization procedure which converts the iterate from a high-order saddle to a strict saddle, from which standard random perturbations in PGD can escape efficiently. We further characterize the high-order saddles that can be escaped by our algorithm.

show abstract

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Cited by 4 publications

References 21 publications

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control with Robustness Constraints

Connectivity of the Feasible and Sublevel Sets of Dynamic Output Feedback Control With Robustness Constraints

Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control

Contact Info

Product

Resources

About