2021 American Control Conference (ACC) 2021
DOI: 10.23919/acc50511.2021.9483309
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

Abstract: We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…A series of papers (e.g. [1,4,5,20]) consider a model where a linear system with unknown dynamics is perturbed by stochastic disturbances; an online learner picks control actions with the goal of minimizing regret against the optimal stabilizing controller. In the "non-stochastic control" setting proposed in [15], the learner knows the dynamics, but the disturbance may be generated adversarially; the controller seeks to minimize regret against the class of disturbance-action policies.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A series of papers (e.g. [1,4,5,20]) consider a model where a linear system with unknown dynamics is perturbed by stochastic disturbances; an online learner picks control actions with the goal of minimizing regret against the optimal stabilizing controller. In the "non-stochastic control" setting proposed in [15], the learner knows the dynamics, but the disturbance may be generated adversarially; the controller seeks to minimize regret against the class of disturbance-action policies.…”
Section: Related Workmentioning
confidence: 99%
“…where we initialize PT −1 = 0. The matrices Ãt , Bt , Ct , Lt , Kt , and Σt are defined in (19) and (20). The regret-optimal filter is the regret-suboptimal filter at level γ opt , where γ opt is the smallest value of γ such that Σt has the same inertia as the matrix I 0 0 −I .…”
Section: Regret-optimal Estimationmentioning
confidence: 99%
“…Remark 2. We note that some works on learning based control have made assumptions that the underlying system is stable [28], or can be stabilized for all possible controls [15], when carrying out a regret analysis. This stands in contrast to Theorem 4 which merely requires that at least one of the M channels is stabilizing for the remote estimator, while some of the sub-optimal channels can cause the expected error covariance to diverge if they are used too often.…”
Section: Regret(t ) = θ(T )mentioning
confidence: 99%
“…Recent work [9] shows guarantees for related models in which the transition and observation dynamics are modeled by linear mixture models; however, their approach is computationally inefficient. We remark there are further works tackling online RL in other POMDPs, such as LQG [33,46], latent POMDPs [32] and reactive POMDPs [31,26].…”
Section: Related Workmentioning
confidence: 99%