Model‐based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees

Kim, Yeonsoo; Lee, Jong Min

doi:10.1002/aic.16544

Cited by 13 publications

(6 citation statements)

References 73 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, for an arbitrary small

δ_{1}

, we can set the compact set

D_{m} = Int \overset{X}{true} / Δ_{δ_{1}}

by excluding the arbitrary thin boundary layer from the

Int \overset{X}{true}

. Then, there exists an

N ((),, δ, δ_{1})

such that if

{\overset{truê}{V}}_{k}

satisfies the CLF condition on

N ((),, δ, δ_{1})

grid points, then

{\overset{truê}{V}}_{k}

is a CLF function on the domain

D_{m}

, which is proven in proposition 1 in Reference 39.Theorem Given a constrained set

\overset{X}{true}

defined by Equation () with a continuously differentiable function

h ((), \overset{true¯}{x})

, the system in Equation () is practically asymptotically stable on

D_{m} = Int \overset{X}{true} / Δ_{δ_{1}}

under the controller

{\overset{truê}{ψ}}_{k}

for all k and for an arbitrary small

δ_{1} > 0

. With the largest

ρ_{k}

Ω_{ρ_{k}} = ({}|, \overset{true¯}{x}, {\overset{truê}{V}}_{k} ((), \overset{true¯}{x}) \leq ρ_{k}) \subset D_{m}

is the estimate of ROA.…”

Section: Safe Rl For Constrained Nonlinear Systemsmentioning

confidence: 88%

“…The similarity of the level-set shapes between two scalar functions can be represented by calculating the standard deviation of the element-wise division of their gradient vectors. 39 If we precisely know the optimal value function, this measure can be used to demonstrate the similarity degree of the trained CLF and the optimal value function. However, determining the optimal value function is difficult, which is why we use RL to learn the optimal control policy along with the optimal value function.…”

Section: Clf and Sontag's Formulamentioning

confidence: 99%

“…The similarity of the level‐set shapes between two scalar functions can be represented by calculating the standard deviation of the element‐wise division of their gradient vectors 39 . If we precisely know the optimal value function, this measure can be used to demonstrate the similarity degree of the trained CLF and the optimal value function.…”

Section: Preliminariesmentioning

confidence: 99%

“…In our previous study, 39 we proposed a new stability‐oriented RL for control‐affine nonlinear systems. It guarantees practical asymptotic stability during training and at the end of training.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Safe model‐based reinforcement learning for nonlinear optimal control with state and input constraints

Kim

2022

AIChE Journal

Self Cite

View full text Add to dashboard Cite

Safety is a critical factor in reinforcement learning (RL) in chemical processes. In our previous work, we had proposed a new stability-guaranteed RL for unconstrained nonlinear control-affine systems. In the approximate policy iteration algorithm, a Lyapunov neural network (LNN) was updated while being restricted to the control Lyapunov function, and a policy was updated using a variation of Sontag's formula. In this study, we additionally consider state and input constraints by introducing a barrier function, and we extend the applicable type to general nonlinear systems. We augment the constraints into the objective function and use the LNN added with a Lyapunov barrier function to approximate the augmented value function. Sontag's formula input with this approximate function brings the states into its lower level set, thereby guaranteeing the constraints satisfaction and stability. We prove the practical asymptotic stability and forward invariance. The effectiveness is validated using four tank system simulations.

show abstract

“…Thus, for an arbitrary small

δ_{1}

, we can set the compact set

D_{m} = Int \overset{X}{true} / Δ_{δ_{1}}

by excluding the arbitrary thin boundary layer from the

Int \overset{X}{true}

. Then, there exists an

N ((),, δ, δ_{1})

such that if

{\overset{truê}{V}}_{k}

satisfies the CLF condition on

N ((),, δ, δ_{1})

grid points, then

{\overset{truê}{V}}_{k}

is a CLF function on the domain

D_{m}

, which is proven in proposition 1 in Reference 39.Theorem Given a constrained set

\overset{X}{true}

defined by Equation () with a continuously differentiable function

h ((), \overset{true¯}{x})

, the system in Equation () is practically asymptotically stable on

D_{m} = Int \overset{X}{true} / Δ_{δ_{1}}

under the controller

{\overset{truê}{ψ}}_{k}

for all k and for an arbitrary small

δ_{1} > 0

. With the largest

ρ_{k}

Ω_{ρ_{k}} = ({}|, \overset{true¯}{x}, {\overset{truê}{V}}_{k} ((), \overset{true¯}{x}) \leq ρ_{k}) \subset D_{m}

is the estimate of ROA.…”

Section: Safe Rl For Constrained Nonlinear Systemsmentioning

confidence: 88%

Section: Clf and Sontag's Formulamentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Safe model‐based reinforcement learning for nonlinear optimal control with state and input constraints

Kim

2022

AIChE Journal

Self Cite

View full text Add to dashboard Cite

show abstract

“…Instead, RL learns from experience of the process, allowing for π(Á) to be recalibrated as the process evolves through time via process data. 5 Furthermore, RL has shown significant industrial potential as demonstrated in a number of research works, which have explored application to the calibration of PID controllers; 6 set point tracking; 7 dynamic optimization of nonlinear, stochastic systems; 5,8,9 de novo drug 10 and protein design; 11 and in augmentation of the performance of various model predictive control (MPC) approaches. 12,13 Indeed, the potential use of RL draws discussion of its relation to MPC in the development of APC schemes.…”

mentioning

confidence: 99%

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

et al. 2021

View full text Add to dashboard Cite

Reinforcement learning (RL) is a data-driven approach to synthesizing an optimal control policy. A barrier to wide implementation of RL-based controllers is its datahungry nature during online training and its inability to extract useful information from human operator and historical process operation data. Here, we present a twostep framework to resolve this challenge. First, we employ apprenticeship learning via inverse RL to analyze historical process data for synchronous identification of a reward function and parameterization of the control policy. This is conducted offline.Second, the parameterization is improved online efficiently under the ongoing process via RL within only a few iterations. Significant advantages of this framework include to allow for the hot-start of RL algorithms for process optimal control, and robust abstraction of existing controllers and control knowledge from data. The framework is demonstrated on three case studies, showing its potential for chemical process control. K E Y W O R D Sapprenticeship learning, inverse reinforcement learning, machine learning, optimal control, reinforcement learning | INTRODUCTIONRecent initiatives for efficiency improvements in industrial process operation has driven interest in the development of high performance, advanced process control (APC) schemes. Reinforcement learning (RL) has achieved impressive results on benchmark game-based control tasks, 1,2 providing an avenue for research in translation to APC. In

show abstract

Impedance control method with reinforcement learning for dual-arm robot installing slabstone

Cao

2022

J Mech Sci Technol

View full text Add to dashboard Cite

Model‐based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees

Cited by 13 publications

References 73 publications

Safe model‐based reinforcement learning for nonlinear optimal control with state and input constraints

Safe model‐based reinforcement learning for nonlinear optimal control with state and input constraints

Using process data to generate an optimal control policy via apprenticeship and reinforcement learning

Impedance control method with reinforcement learning for dual-arm robot installing slabstone

Contact Info

Product

Resources

About