2020
DOI: 10.48550/arxiv.2007.12291
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Explore More and Improve Regret in Linear Quadratic Regulators

Abstract: Stabilizing the unknown dynamics of a control system and minimizing regret in control of an unknown system are among the main goals in control theory and reinforcement learning. In this work, we pursue both these goals for adaptive control of linear quadratic regulators (LQR).Prior works accomplish either one of these goals at the cost of the other one. The algorithms that are guaranteed to find a stabilizing controller suffer from high regret, whereas algorithms that focus on achieving low regret assume the p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(23 citation statements)
references
References 2 publications
0
23
0
Order By: Relevance
“…i . Unlike related works on system identification and regret analysis [15,36,37,52,59], mean-square stability does not lead to strong high-probability bounds, as one can only bound x t or x t x ⊺ t in expectation. Therefore, in Algorithm 1, we sample only bounded state-excitation pairs (x t , z t ) on each mode i ∈ [s].…”
Section: System Identification For Mjsmentioning
confidence: 99%
See 1 more Smart Citation
“…i . Unlike related works on system identification and regret analysis [15,36,37,52,59], mean-square stability does not lead to strong high-probability bounds, as one can only bound x t or x t x ⊺ t in expectation. Therefore, in Algorithm 1, we sample only bounded state-excitation pairs (x t , z t ) on each mode i ∈ [s].…”
Section: System Identification For Mjsmentioning
confidence: 99%
“…A commonly used control paradigm is the Linear Quadratic Regulator (LQR), which is theoretically well understood when system dynamics are linear and known. LQR also provides an interesting benchmark, when system dynamics are unknown, for reinforcement learning (RL) with continuous state and action spaces and for adaptive control [2,4,9,16,36,47].…”
Section: Introductionmentioning
confidence: 99%
“…Most existing work assume that we have access to a stabilizer. Recent attempts to remove this assumption include [LAHA20]. There, for example, the authors assume that the algorithm knows that the system (A, B) belongs to a set of systems…”
Section: A Related Workmentioning
confidence: 99%
“…In [LAHA20], the authors provide another improvement on the OFU based algorithm of [AYS11]. Their goal is to improve the dependency of the regret upper bound on the dimension and to remove the assumption of having access to a stabilizing controller.…”
Section: Ofu-based Algorithmsmentioning
confidence: 99%
“…(3) Online LQR with Unknown Dynamics and Time-Invariant Costs: There is a recent line of research dealing with LQR control problems with unknown dynamics. Several techniques are proposed using (i) gradient estimation (see e.g., [31]- [34]) (ii) the estimation of dynamics matrices and derivation of the controller by considering the estimation uncertainty ( [7], [8], [23], [35]- [37]), and (iii) wave-filtering [38], [39].…”
Section: A Related Workmentioning
confidence: 99%