2021
DOI: 10.1109/tnnls.2020.2978805
|View full text |Cite
|
Sign up to set email alerts
|

Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation

Abstract: We develop a method for obtaining safe initial policies for reinforcement learning via approximate dynamic programming (ADP) techniques for uncertain systems evolving with discrete-time dynamics. We employ kernelized Lipschitz estimation and semidefinite programming for computing admissible initial control policies with provably high probability. Such admissible controllers enable safe initialization and constraint enforcement while providing exponential stability of the equilibrium of the closed-loop system.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 49 publications
(70 reference statements)
0
4
0
Order By: Relevance
“…Broadly, there are two ways for estimating Lipschitz constants of general nonlinear functions, either sampling-based as in [16] and [17], or using optimization techniques [7], [18]. A naive approach is to calculate the product of the norm of the weights of each individual layer.…”
Section: A Related Workmentioning
confidence: 99%
“…Broadly, there are two ways for estimating Lipschitz constants of general nonlinear functions, either sampling-based as in [16] and [17], or using optimization techniques [7], [18]. A naive approach is to calculate the product of the norm of the weights of each individual layer.…”
Section: A Related Workmentioning
confidence: 99%
“…For basis functions whose Lipschitz constant are not analytically computable, one could use the sampling-based kernelized learning method discussed in [27,Section III] to obtain an overestimate of the Lipschitz constant with high probability. With the estimate Lφ , we can use LMIs to obtain a redesigned gain L. The following theorem encapsulates these redesign conditions.…”
Section: Observer Gain Redesignmentioning
confidence: 99%
“…then the redesigned observer (27) with gain L " P ´1K makes the error dynamics (3) L-ISS with respect to e p , with an improvement of the convergence rate compared with (11), quantified by a Lyapunov function decrease bound as…”
Section: Observer Gain Redesignmentioning
confidence: 99%
See 1 more Smart Citation