Global Convergence of Policy Gradient Primal–Dual Methods for Risk-Constrained LQRs

Zhao, Feiran; You, Keyou; Başar, Tamer

doi:10.1109/tac.2023.3234176

Cited by 13 publications

(17 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For reinforcement learning (RL) [9], the focus has been on learning the system dynamics and providing closed-loop guarantees in finite-time for both linear [16,24,48] and nonlinear systems [7,37,49,76]. For model-free RL, [32,62,66,100] proved the convergence of policy optimization to the optimal controller for LTI systems, [63,67] for LTV systems, [82] for partially observed linear systems. For a review of policy optimization (PO) methods for LQR, H ∞ control, risk-sensitive control, LQG, and output feedback synthesis, see [34].…”

Section: Control Design Problems For Hyperbolic Pdes Are Hyperbolicmentioning

confidence: 99%

Neural Operators of Backstepping Controller and Observer Gain Functions for Reaction-Diffusion PDEs

Krstić¹,

Bhan²,

Shi³

2023

Preprint

View full text Add to dashboard Cite

Unlike ODEs, whose models involve system matrices and whose controllers involve vector or matrix gains, PDE models involve functions in those roles-functional coefficients, dependent on the spatial variables, and gain functions dependent on space as well. The designs of gains for controllers and observers for PDEs, such as PDE backstepping, are mappings of system model functions into gain functions. These infinite-dimensional nonlinear operators are given in an implicit form through PDEs, in spatial variables, which need to be solved to determine the gain function for each new functional coefficient of the PDE. The need for solving such PDEs can be eliminated by learning and approximating the said design mapping in the form of a neural operator. Learning the neural operator requires a sufficient number of prior solutions for the design PDEs, offline, as well as the training of the operator. In recent work, we developed the neural operators for PDE backstepping designs for first-order hyperbolic PDEs. Here we extend this framework to the more complex class of parabolic PDEs. The key theoretical question is whether the controllers are still stabilizing, and whether the observers are still convergent, if they employ the approximate functional gains generated by the neural operator. We provide affirmative answers to these questions, namely, we prove stability in closed loop under gains produced by neural operators. We illustrate the theoretical results with numerical tests and publish our code on github. The neural operators are three orders of magnitude faster in generating gain functions than PDE solvers for such gain functions. This opens up the opportunity for the use of this neural operator methodology in adaptive control and in gain scheduling control for nonlinear PDEs.

show abstract

Section: Control Design Problems For Hyperbolic Pdes Are Hyperbolicmentioning

confidence: 99%

Neural Operators of Backstepping Controller and Observer Gain Functions for Reaction-Diffusion PDEs

Krstić¹,

Bhan²,

Shi³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In this paper, we take an iterative PO perspective to solve (7) viewing G as the optimization matrix. We aim to design a gradient-based method to find an optimal G while maintaining feasibility, and recover the control from ( 5) as 7) is a challenging constrained nonconvex problem, we leverage a novel convex parameterization to establish the global convergence.…”

Section: B Direct Data-driven Formulationmentioning

confidence: 99%

“…In this section, we first present our novel PO method for solving (7). Then, we propose a new strongly convex parameterization of (7) to derive the projected gradient dominance property of J(G).…”

Section: Data-enabled Policy Optimizationmentioning

confidence: 99%

“…Based on zeroth-order optimization techniques, it uses multiple system trajectories to estimate the policy gradient. There has been a resurgent interest in studying theoretical properties of PO on the LQR problem such as convergence and sample complexity; see e.g., [4]- [7] and the comprehensive survey [8]. Even though global convergence has been shown for the nonconvex PO Research of Feiran Zhao and Keyou You was supported by National Natural Science Foundation of China under Grant no.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost

Zhao

You

Başar

2021

2021 60th IEEE Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Policy optimization (PO), an essential approach of reinforcement learning for a broad range of system classes, requires significantly more system data than indirect (identification-followed-by-control) methods or behavioralbased direct methods even in the simplest linear quadratic regulator (LQR) problem. In this paper, we take an initial step towards bridging this gap by proposing the data-enabled policy optimization (DeePO) method, which requires only a finite number of sufficiently exciting data to iteratively solve the LQR via PO. Based on a data-driven closed-loop parameterization, we are able to directly compute the policy gradient from a bath of persistently exciting data. Next, we show that the nonconvex PO problem satisfies a projected gradient dominance property by relating it to an equivalent convex program, leading to the global convergence of DeePO. Moreover, we apply regularization methods to enhance certainty-equivalence and robustness of the resulting controller and show an implicit regularization property. Finally, we perform simulations to validate our results.

show abstract

“…The main focus has been on learning the system dynamics and providing closed-loop guarantees in finite-time for both linear systems [15], [23], [29], [42], [77] (and references within), and nonlinear systems [5], [35], [43], [71]. For model-free RL methods, [30], [56], [60], [90] proved the convergence of policy optimization, a popular modelfree RL method, to the optimal controller for linear timeinvariant systems, [58], [61] for linear time-varying systems, [75] for partially observed linear systems. See [32] for a recent review of policy optimization methods for continuous control problems such as the LQR, H ∞ control, risk-sensitive control, LQG, and output feedback synthesis.…”

Section: Introductionmentioning

confidence: 99%

Neural Operators for Bypassing Gain and Control Computations in PDE Backstepping

Bhan¹,

Shi²,

Krstić³

2023

Preprint

View full text Add to dashboard Cite

We introduce a framework for eliminating the computation of controller gain functions in PDE control. We learn the nonlinear operator from the plant parameters to the control gains with a (deep) neural network. We provide closed-loop stability guarantees (global exponential) under an NN-approximation of the feedback gains. While, in the existing PDE backstepping, finding the gain kernel requires (one offline) solution to an integral equation, the neural operator (NO) approach we propose learns the mapping from the functional coefficients of the plant PDE to the kernel function by employing a sufficiently high number of offline numerical solutions to the kernel integral equation, for a large enough number of the PDE model's different functional coefficients. We prove the existence of a Deep-ONet approximation, with arbitrarily high accuracy, of the exact nonlinear continuous operator mapping PDE coefficient functions into gain functions. Once proven to exist, learning of the NO is standard, completed "once and for all" (never online) and the kernel integral equation doesn't need to be solved ever again, for any new functional coefficient not exceeding the magnitude of the functional coefficients used for training. We also present an extension from approximating the gain kernel operator to approximating the full feedback law mapping, from plant parameter functions and state measurement functions to the control input, with semiglobal practical stability guarantees. Simulation illustrations are provided and code is available on github. This framework, eliminating real-time recomputation of gains, has the potential to be game changing for adaptive control of PDEs and gain scheduling control of nonlinear PDEs.The paper requires no prior background in machine learning or neural networks.

show abstract

Global Convergence of Policy Gradient Primal–Dual Methods for Risk-Constrained LQRs

Cited by 13 publications

References 23 publications

Neural Operators of Backstepping Controller and Observer Gain Functions for Reaction-Diffusion PDEs

Neural Operators of Backstepping Controller and Observer Gain Functions for Reaction-Diffusion PDEs

Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost

Neural Operators for Bypassing Gain and Control Computations in PDE Backstepping

Contact Info

Product

Resources

About