2022
DOI: 10.1109/tac.2021.3087455
|View full text |Cite
|
Sign up to set email alerts
|

Convergence and Sample Complexity of Gradient Methods for the Model-Free Linear–Quadratic Regulator Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
37
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(40 citation statements)
references
References 33 publications
3
37
0
Order By: Relevance
“…In light of the above discussion, Lλ (K) can be used to evaluate L λ (K) provided the step size δ is sufficiently small, horizon H sufficiently large, and sample size N sufficiently large. This minimics the findings of Fazel et al [2018], Mohammadi et al [2021], Malik et al [2019] in various related settings.…”
Section: C1 Finite-sample Considerationssupporting
confidence: 66%
See 2 more Smart Citations
“…In light of the above discussion, Lλ (K) can be used to evaluate L λ (K) provided the step size δ is sufficiently small, horizon H sufficiently large, and sample size N sufficiently large. This minimics the findings of Fazel et al [2018], Mohammadi et al [2021], Malik et al [2019] in various related settings.…”
Section: C1 Finite-sample Considerationssupporting
confidence: 66%
“…), r a user-defined smoothing parameter that scales the perturbation, and 1 N(r) a normalization constant. As in previous work, (Fazel et al [2018], Malik et al [2019], Mohammadi et al [2021]), one can argue that this yields an estimator of the gradient with polynomial sample complexity. As in prior work, r must be chosen sufficiently small so that the perturbations do not render A K unstable.…”
Section: C1 Finite-sample Considerationssupporting
confidence: 62%
See 1 more Smart Citation
“…These research show that the optimal control is a linear function of the state and the coefficient can be obtained by solving the Riccati equation (Anderson and Moore, 2007). Recent research focus more on the model free setting in the context of RL, where the algorithm does not know the dynamic and have only observations of state and rewards (Tu and Recht, 2018;Mohammadi et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…For example, Q-learning for discrete-time LQR problems was proposed in [4]. For policy gradient methods, the global linear convergence to the global optima was obtained in [7,22]. To obtain structured policy, Structured Policy Iteration for LQR problems with a regularization term was proposed in [23] and the local linear convergence to a stationary point was provided.…”
mentioning
confidence: 99%