2022
DOI: 10.48550/arxiv.2202.11659
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Abstract: We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing ov… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(15 citation statements)
references
References 21 publications
0
15
0
Order By: Relevance
“…In addition to (23), the proof idea of using the change of variables (18) can be applied to other output feedback control problems to establish connectivity of their strict sublevel sets. For example, we can consider an H 2 formulation of the LQG control [16] as follows…”
Section: Revisit Sublevel Sets In Lqg and H ∞ Controlmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition to (23), the proof idea of using the change of variables (18) can be applied to other output feedback control problems to establish connectivity of their strict sublevel sets. For example, we can consider an H 2 formulation of the LQG control [16] as follows…”
Section: Revisit Sublevel Sets In Lqg and H ∞ Controlmentioning
confidence: 99%
“…The above condition is not convex in K and P . However, we can use the same change of variables as (18) in the main text. A controller K ∈ L γ can be constructed if ∃(X, Y, Â, B, Ĉ, Γ) such that the following LMI holds 3 ,…”
Section: Appendixmentioning
confidence: 99%
“…For real-world control applications, however, we may only have access to partial output measurements. In the output feedback case, the theoretical results for direct policy search are much fewer and far less complete [14]- [18]. It remains unclear whether model-free policy gradient methods can be modified to yield global convergence guarantees.…”
Section: Introductionmentioning
confidence: 99%
“…The Hessian of J 2 (K) at the optimal controller K ⋆ == 2916 1.0517 −1.0517 −1.24921.0208 1.2916 1.5084 −1.2492 1.0889 −0.8980 0.8980 1.0488 −0.8685 −1.0889 −1.2492 1.0488 which is positive semidefinite and has eigenvalues λ 1 = 8.1111 × 10 5 , λ 2 = 6 133.9, λ 3 = 131.2, λ 4 = 6.36, λ 5= • • • = λ 8 = 0.We further compute the matrices in(16) as follows solutions to Lyapunov equations (9a) and (9b) are can compute( CX op + V BT K )(sI − A T cl ) −1 Y op B = −12.5s3 − 604.2s 2 − 1712s − 566.7 s 4 + 6s 3 + 11s 2 + 6s + 1 ,…”
mentioning
confidence: 99%
“…Therefore, according to [20, Theorem 4.2], the zero controller K = 0 0 0 Λ ∈ C 2 with any stable Λ ∈ R 2×2 is a stationary point. We compute the matrices in (16) as . Then, we can compute…”
mentioning
confidence: 99%