2020
DOI: 10.1002/aic.16544
|View full text |Cite
|
Sign up to set email alerts
|

Model‐based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees

Abstract: We propose a new reinforcement learning approach for nonlinear optimal control where the value function is updated as restricted to control Lyapunov function (CLF) and the policy is improved using a variation of Sontag's formula. The practical asymptotic stability of the closed‐loop system is guaranteed during the training and at the end of training without requiring an additional actor network and its update rule. For a single‐layer neural network (NN) with exact basis functions, the approximate function conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 73 publications
0
6
0
Order By: Relevance
“…Thus, for an arbitrary small δ1, we can set the compact set Dm=Int0.5emtrueX¯/Δδ1 by excluding the arbitrary thin boundary layer from the Int0.5emtrueX¯. Then, there exists an N(),δδ1 such that if Vtruêk satisfies the CLF condition on N(),δδ1 grid points, then Vtruêk is a CLF function on the domain Dm, which is proven in proposition 1 in Reference 39.Theorem Given a constrained set trueX¯ defined by Equation () with a continuously differentiable function h()xtrue¯, the system in Equation () is practically asymptotically stable on Dm=Int0.5emtrueX¯/Δδ1 under the controller ψtruêk for all k and for an arbitrary small δ1>0. With the largest ρk, Ωρk={}|xtrue¯Vtruêk()xtrue¯ρkDm is the estimate of ROA.…”
Section: Safe Rl For Constrained Nonlinear Systemsmentioning
confidence: 88%
See 3 more Smart Citations
“…Thus, for an arbitrary small δ1, we can set the compact set Dm=Int0.5emtrueX¯/Δδ1 by excluding the arbitrary thin boundary layer from the Int0.5emtrueX¯. Then, there exists an N(),δδ1 such that if Vtruêk satisfies the CLF condition on N(),δδ1 grid points, then Vtruêk is a CLF function on the domain Dm, which is proven in proposition 1 in Reference 39.Theorem Given a constrained set trueX¯ defined by Equation () with a continuously differentiable function h()xtrue¯, the system in Equation () is practically asymptotically stable on Dm=Int0.5emtrueX¯/Δδ1 under the controller ψtruêk for all k and for an arbitrary small δ1>0. With the largest ρk, Ωρk={}|xtrue¯Vtruêk()xtrue¯ρkDm is the estimate of ROA.…”
Section: Safe Rl For Constrained Nonlinear Systemsmentioning
confidence: 88%
“…The similarity of the level-set shapes between two scalar functions can be represented by calculating the standard deviation of the element-wise division of their gradient vectors. 39 If we precisely know the optimal value function, this measure can be used to demonstrate the similarity degree of the trained CLF and the optimal value function. However, determining the optimal value function is difficult, which is why we use RL to learn the optimal control policy along with the optimal value function.…”
Section: Clf and Sontag's Formulamentioning
confidence: 99%
See 2 more Smart Citations
“…Instead, RL learns from experience of the process, allowing for π(Á) to be recalibrated as the process evolves through time via process data. 5 Furthermore, RL has shown significant industrial potential as demonstrated in a number of research works, which have explored application to the calibration of PID controllers; 6 set point tracking; 7 dynamic optimization of nonlinear, stochastic systems; 5,8,9 de novo drug 10 and protein design; 11 and in augmentation of the performance of various model predictive control (MPC) approaches. 12,13 Indeed, the potential use of RL draws discussion of its relation to MPC in the development of APC schemes.…”
mentioning
confidence: 99%