2019
DOI: 10.48550/arxiv.1902.06223
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

Abstract: We present the first computationally-efficient algorithm with O( √ T ) regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Tu (2018).

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
80
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(80 citation statements)
references
References 4 publications
0
80
0
Order By: Relevance
“…We remark that despite being linear, the Markov transition model P h (•|x, a) can still have infinite degrees of freedom as the measure µ h is unknown. This is a key difference from the linear quadratic regulator [1,18,4,3,15] or the recent work of Yang and Wang [50], whose transition models are completely specified by a finite-dimensional matrix such that the degrees of freedom are bounded.…”
Section: Linear Markov Decision Processesmentioning
confidence: 99%
“…We remark that despite being linear, the Markov transition model P h (•|x, a) can still have infinite degrees of freedom as the measure µ h is unknown. This is a key difference from the linear quadratic regulator [1,18,4,3,15] or the recent work of Yang and Wang [50], whose transition models are completely specified by a finite-dimensional matrix such that the degrees of freedom are bounded.…”
Section: Linear Markov Decision Processesmentioning
confidence: 99%
“…Faradonbeh et al [15] argue that certainty equivalence control with an epsilon-greedy-like scheme achieves O( √ T ) regret, though their work does not provide any explicit dependencies on instance parameters. Finally, Cohen et al [9] also give an efficient algorithm based on semidefinite programming that achieves O( √ T ) regret. The literature for LQG is less complete, with most of the focus on the estimation side.…”
Section: Related Workmentioning
confidence: 99%
“…A series of papers (e.g. [1,4,5,20]) consider a model where a linear system with unknown dynamics is perturbed by stochastic disturbances; an online learner picks control actions with the goal of minimizing regret against the optimal stabilizing controller. In the "non-stochastic control" setting proposed in [15], the learner knows the dynamics, but the disturbance may be generated adversarially; the controller seeks to minimize regret against the class of disturbance-action policies.…”
Section: Related Workmentioning
confidence: 99%