Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

Liu, Derong; Wei, Qinglai

doi:10.1109/tnnls.2013.2281663

Cited by 582 publications

(212 citation statements)

References 41 publications

Supporting

Mentioning

209

Contrasting

Order By: Relevance

“…17 And when j approaches the infinity, the developed algorithm becomes a policy iteration. 45 Above all, we can conclude that the developed novel ADP algorithm is a general idea that unifies almost all ADP and reinforcement learning methods.…”

Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning

confidence: 79%

“…Moreover, a control law, which not only stabilizes the system (1) but also make the performance index function finite, is said to be admissible. 45 For simplicity, the systems (1) can be represented as…”

Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning

confidence: 99%

“…17 On the other hand, when j approaches the infinity, the developed algorithm can be considered as a policy iteration algorithm. 45 Furthermore, the developed algorithm can accelerate the convergence rate without requiring to solve the HJB equation for i-iteration. First, a nonquadratic function is used to derive the HJB equation for discrete-time nonlinear systems with actuator saturation.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Optimal control for discrete‐time systems with actuator saturation

Qiao

Wei

Zhao

2017

Optim Control Appl Methods

Self Cite

View full text Add to dashboard Cite

Summary In this study, we use generalized policy iteration approximate dynamic programming (ADP) algorithm to design an optimal controller for a class of discrete‐time systems with actuator saturation. A integral function is proposed to manage the saturation nonlinearity in actuators and then the generalized policy iteration ADP algorithm is developed to deal with the optimal control problem. Compared with other algorithm, the developed ADP algorithm includes 2 iteration procedures. In the present control scheme, 2 neural networks are introduced to approximate the control law and performance index function. Furthermore, numerical simulations illustrate the convergence and feasibility of the developed method.

show abstract

Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning

confidence: 79%

Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimal control for discrete‐time systems with actuator saturation

Qiao

Wei

Zhao

2017

Optim Control Appl Methods

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is not always clear how to initialize the weights of the neural approximators (26). Commonly, small random numbers drawn from a uniform distribution are used [39], but there is no safety guarantee associated with random initialization. We propose initializing the weights as follows.…”

Section: A Unconstrained Adpmentioning

confidence: 99%

Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation

Chakrabarty

Jha

Buzzard

et al. 2021

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

We develop a method for obtaining safe initial policies for reinforcement learning via approximate dynamic programming (ADP) techniques for uncertain systems evolving with discrete-time dynamics. We employ kernelized Lipschitz estimation and semidefinite programming for computing admissible initial control policies with provably high probability. Such admissible controllers enable safe initialization and constraint enforcement while providing exponential stability of the equilibrium of the closed-loop system.

show abstract

“…Both of the examples show the feasibility and effectiveness of the proposed algorithms.KEYWORDS approximation dynamic programming (ADP), continuous-time systems, integral reinforcement learning (IRL), online learning, value iteration SU ET AL.heuristic dynamic programming (HDP), action-dependent HDP, dual HDP (DHP), action-dependent DHP, globalized DHP, and action-dependent GDHP. 32,33 In addition, from an implementation point of view, the iteration schemes of ADP can be divided into 2 classes: policy iteration algorithms and value iteration algorithms.The implementation process of the policy iteration method should start with a given initial admissible policy (the definition will be given herein). However, by now, how to obtain an admissible policy is still an open issue.…”

mentioning

confidence: 99%

Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration

Zhang

et al. 2017

Optim Control Appl Methods

View full text Add to dashboard Cite

Summary In this paper, a modified value iteration–based approximate dynamic programming method is proposed for a class of affine nonlinear continuous‐time systems, whose dynamics are partially unknown. The value iteration algorithm is established in an online fashion, and the convergence proof is given. To attenuate the effect caused by the unascertained characteristics of the system dynamics, the integral reinforcement learning scheme is also used. In the proposed approximate dynamic programming method, it is emphasized that the single‐network structure is utilized to estimate the value functions and the control policies. That is, the iteration process is implemented on the actor/critic structure, in which case only the critic NN is required to be identified. Then, the least‐squares scheme is derived for the NN weights updating. Finally, a linear system and a nonlinear system are tested to evaluate the performance of the proposed online value iteration algorithm. Both of the examples show the feasibility and effectiveness of the proposed algorithms.

show abstract

Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

Cited by 582 publications

References 41 publications

Optimal control for discrete‐time systems with actuator saturation

Optimal control for discrete‐time systems with actuator saturation

Safe Approximate Dynamic Programming via Kernelized Lipschitz Estimation

Online reinforcement learning for a class of partially unknown continuous‐time nonlinear systems via value iteration

Contact Info

Product

Resources

About