2021
DOI: 10.48550/arxiv.2109.14429
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Minimal Expected Regret in Linear Quadratic Control

Abstract: We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices A and B may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time T is upper bounded (i) by O((du + dx) √ dxT ) when A and B are unknown, (ii) by O(d 2x log(T )) if only A is unknown, and (iii) by O(dx(du + dx) log(T )) if only B is unknown and under some mild non-degeneracy condition (dx and du denot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…The study of regret in online LQR was re-initiated by Abbasi-Yadkori and Szepesvári (2011), inspired by works in the RL community. Many works followed to propose algorithms which were computationally tractable (Ouyang et al, 2017;Dean et al, 2018;Abeille and Lazaric, 2018;Cohen et al, 2019;Faradonbeh et al, 2020;Jedra and Proutiere, 2021). Lower bounds on the regret of online LQR are presented in Simchowitz and Foster (2020); Cassel et al (2020); Ziemann and Sandberg (2022).…”
Section: Related Workmentioning
confidence: 99%
“…The study of regret in online LQR was re-initiated by Abbasi-Yadkori and Szepesvári (2011), inspired by works in the RL community. Many works followed to propose algorithms which were computationally tractable (Ouyang et al, 2017;Dean et al, 2018;Abeille and Lazaric, 2018;Cohen et al, 2019;Faradonbeh et al, 2020;Jedra and Proutiere, 2021). Lower bounds on the regret of online LQR are presented in Simchowitz and Foster (2020); Cassel et al (2020); Ziemann and Sandberg (2022).…”
Section: Related Workmentioning
confidence: 99%
“…In a closely related line of work, Dean et al [2018] provide an O(T 2/3 ) regret bound for robust adaptive LQR control, drawing inspiration from classical methods in system identification and robust adaptive control. It has since been shown that certainty equivalent control, without robustness, can attain the (locally) minimax optimal O( √ T ) regret [Mania et al, 2019, Faradonbeh et al, 2020, Lale et al, 2020a, Jedra and Proutiere, 2021. In particular, by providing nearly matching upper and lower bounds, Simchowitz and Foster [2020] refine this analysis and establish that the optimal rate, without taking system theoretic quantities into account, is R T = Θ( p 2 nT ).…”
Section: Related Workmentioning
confidence: 99%
“…The goal is to learn a linear gain K ∈ R m×n such that the closed-loop system A + BK is stable, i.e., such that its spectral radius ρ(A + BK) is less than one. Many algorithms for online LQR require the existence of such a stabilizing gain to initialize the online learning policy Foster, 2020, Jedra andProutiere, 2021]. Furthermore, stabilization is a problem of independent interest [Faradonbeh et al, 2018b].…”
Section: Introductionmentioning
confidence: 99%