2014
DOI: 10.1109/tnnls.2013.2281663
|View full text |Cite
|
Sign up to set email alerts
|

Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems

Abstract: This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
209
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 582 publications
(212 citation statements)
references
References 41 publications
3
209
0
Order By: Relevance
“…17 And when j approaches the infinity, the developed algorithm becomes a policy iteration. 45 Above all, we can conclude that the developed novel ADP algorithm is a general idea that unifies almost all ADP and reinforcement learning methods.…”
Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning
confidence: 79%
See 2 more Smart Citations
“…17 And when j approaches the infinity, the developed algorithm becomes a policy iteration. 45 Above all, we can conclude that the developed novel ADP algorithm is a general idea that unifies almost all ADP and reinforcement learning methods.…”
Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning
confidence: 79%
“…Moreover, a control law, which not only stabilizes the system (1) but also make the performance index function finite, is said to be admissible. 45 For simplicity, the systems (1) can be represented as…”
Section: Derivation Of the Generalized Policy Iteration Adp Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…It is not always clear how to initialize the weights of the neural approximators (26). Commonly, small random numbers drawn from a uniform distribution are used [39], but there is no safety guarantee associated with random initialization. We propose initializing the weights as follows.…”
Section: A Unconstrained Adpmentioning
confidence: 99%
“…Both of the examples show the feasibility and effectiveness of the proposed algorithms.KEYWORDS approximation dynamic programming (ADP), continuous-time systems, integral reinforcement learning (IRL), online learning, value iteration SU ET AL.heuristic dynamic programming (HDP), action-dependent HDP, dual HDP (DHP), action-dependent DHP, globalized DHP, and action-dependent GDHP. 32,33 In addition, from an implementation point of view, the iteration schemes of ADP can be divided into 2 classes: policy iteration algorithms and value iteration algorithms.The implementation process of the policy iteration method should start with a given initial admissible policy (the definition will be given herein). However, by now, how to obtain an admissible policy is still an open issue.…”
mentioning
confidence: 99%