Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence

Jansch-Porto, Joao Paulo; Hu, Bin; Dullerud, Geir E.

doi:10.1109/tac.2022.3176439

Cited by 4 publications

(8 citation statements)

References 36 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The coercive property, compactness of the sublevel set, and L-smoothness of the cost function in the SOF problem, can be deemed as partially observed counterparts to the properties of the state-feedback LQR cost. The associated proofs follow similar lines as the state-feedback LQR case [12], [19]. Different from these properties, to the best of our knowledge, we are the first to establish the M -Lipschitz continuous Hessian in both SOF and state-feedback LQR problems.…”

Section: Gradients and Hessianmentioning

confidence: 65%

“…In this section, we give the analytical expression for both the gradient and Hessian. The derivations follow similar lines as the state-feedback LQR case [11], [19].…”

Section: Gradients and Hessianmentioning

confidence: 87%

“…Different from these properties, to the best of our knowledge, we are the first to establish the M -Lipschitz continuous Hessian in both SOF and state-feedback LQR problems. Notably, this property cannot be straightforwardly derived using methods akin to those employed for establishing L-smoothness [19]. This is because the analysis of ∇ 3 J(vec(K)) necessitates complicated tensor operations.…”

Section: Gradients and Hessianmentioning

confidence: 99%

“…This section presents new convergence findings for three variants of policy gradient methods applied to SOF: the vanilla policy gradient, the natural policy gradient, and the Gauss-Newton method. These three methods have been extensively analyzed in related studies [11], [12], [19]. Given that the cost function J(K) is non-convex, the properties outlined in Section IV play a crucial role in facilitating the convergence analysis for these policy gradient methods.…”

Section: Convergencementioning

confidence: 99%

“…This property is typically overlooked in the extant literature on both SOF and state-feedback LQR. Diverging from approaches that establish L-smoothness [19], we prove Hessian Lipschitz continuity through a direct application of its definition, thereby avoiding complex tensor operations. 2) Unlike state-feedback LQR, where theories of convergence often hinge on the concept of gradient dominance [11]- [14], [16], [18], the landscape of SOF problems presents greater complexities.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback

Duan,

Li,

Chen

et al. 2024

IEEE Trans. Cybern.

View full text Add to dashboard Cite

In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, outputfeedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M -Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.

show abstract

Section: Gradients and Hessianmentioning

confidence: 65%

“…In this section, we give the analytical expression for both the gradient and Hessian. The derivations follow similar lines as the state-feedback LQR case [11], [19].…”

Section: Gradients and Hessianmentioning

confidence: 87%

Section: Gradients and Hessianmentioning

confidence: 99%

Section: Convergencementioning

confidence: 99%