Convergence and Sample Complexity of Gradient Methods for the Model-Free Linear–Quadratic Regulator Problem

Mohammadi, Hesameddin; Zare, Armin; Soltanolkotabi, Mahdi; Jovanović, Mihailo R.

doi:10.1109/tac.2021.3087455

Cited by 59 publications

(40 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In light of the above discussion, Lλ (K) can be used to evaluate L λ (K) provided the step size δ is sufficiently small, horizon H sufficiently large, and sample size N sufficiently large. This minimics the findings of Fazel et al [2018], Mohammadi et al [2021], Malik et al [2019] in various related settings.…”

Section: C1 Finite-sample Considerationssupporting

confidence: 66%

“…), r a user-defined smoothing parameter that scales the perturbation, and 1 N(r) a normalization constant. As in previous work, (Fazel et al [2018], Malik et al [2019], Mohammadi et al [2021]), one can argue that this yields an estimator of the gradient with polynomial sample complexity. As in prior work, r must be chosen sufficiently small so that the perturbations do not render A K unstable.…”

Section: C1 Finite-sample Considerationssupporting

confidence: 62%

“…Specifically, Fazel et al [2018] established global convergence of policy gradient methods on the discrete-time linear quadratic regulator (LQR) problem, the simplest continuous state-action optimal control problem. Subsequent work has sharpened rates [Malik et al, 2019], analyzed convergence under more general frameworks [Bu et al, 2019], and extended the analysis to work in continuous-time [Mohammadi et al, 2021]. Beyond LQR, Zhang et al [2020] analyzed global convergence of policy search for mixed H 2 /H ∞ control and risk-sensitive control.…”

Section: Solution Of the Oe Problem And Convex Reformulationmentioning

confidence: 99%

See 2 more Smart Citations

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Umenberger¹,

Simchowitz²,

Perdomo³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain internal state.In this paper, we provide a new perspective on this challenging problem based on the notion of informativity, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a regularizer which explicitly enforces informativity, and establish that gradient descent on this regularized objective -combined with a "reconditioning step" -converges to the globally optimal cost a O(1/T ) rate. Our analysis relies on several new results which may be of independent interest, including a new framework for analyzing non-convex gradient descent via convex reformulation, and novel bounds on the solution to linear Lyapunov equations in terms of (our quantitative measure of) informativity.

show abstract

Section: C1 Finite-sample Considerationssupporting

confidence: 66%

Section: C1 Finite-sample Considerationssupporting

confidence: 62%

Section: Solution Of the Oe Problem And Convex Reformulationmentioning

confidence: 99%

See 1 more Smart Citation

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Umenberger¹,

Simchowitz²,

Perdomo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…These research show that the optimal control is a linear function of the state and the coefficient can be obtained by solving the Riccati equation (Anderson and Moore, 2007). Recent research focus more on the model free setting in the context of RL, where the algorithm does not know the dynamic and have only observations of state and rewards (Tu and Recht, 2018;Mohammadi et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

Zhou¹,

Lu²

2022

Preprint

View full text Add to dashboard Cite

We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity O(ε −1 log(ε −1 ) 2 ). The method in the proof is applicable to general single time-scale bilevel optimization problem. We also numerically validate our theoretical results on the convergence.

show abstract

“…For example, Q-learning for discrete-time LQR problems was proposed in [4]. For policy gradient methods, the global linear convergence to the global optima was obtained in [7,22]. To obtain structured policy, Structured Policy Iteration for LQR problems with a regularization term was proposed in [23] and the local linear convergence to a stationary point was provided.…”

mentioning

confidence: 99%

Structured Output Feedback Control for Linear Quadratic Regulator Using Policy Gradient Method

Takakura¹,

Sato²

2022

Preprint

View full text Add to dashboard Cite

We consider the static output feedback control for Linear Quadratic Regulator problems with the feedback gain constraints from the perspective of the reinforcement learning. In the model based setting, we introduce the gradient projection method and show its global convergence to stationary points. To solve the problem in the model free setting, we propose the policy gradient algorithm based on the gradient projection method and show its global convergence to ε-stationary points. In addition, we introduce a variance reduction technique and show both theoretically and numerically that it significantly reduces the variance in the gradient estimation. We also show in the numerical experiments that the model free approach performs almost as well as the model based approach.

show abstract

Convergence and Sample Complexity of Gradient Methods for the Model-Free Linear–Quadratic Regulator Problem

Cited by 59 publications

References 33 publications

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

Structured Output Feedback Control for Linear Quadratic Regulator Using Policy Gradient Method

Contact Info

Product

Resources

About