2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2011
DOI: 10.1109/adprl.2011.5967351
|View full text |Cite
|
Sign up to set email alerts
|

Data-based adaptive critic design for discrete-time zero-sum games using output feedback

Abstract: A novel data-based adaptive critic design (ACD) using output feedback is proposed for discrete-time zero-sum games in this paper. The proposed data-based adaptive critic design (ACD) is actually a direct adaptive output feedback control scheme. The main contribution of this paper is that not only knowledge of system model but also information of system states are not required. Only the data measured from input and output are required for reaching the saddle point of the zero-sum games by using proposed data-ba… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 30 publications
0
7
0
Order By: Relevance
“…The dual input zero‐sum system was optimized with ADP structure in Reference 23, and it proved the existence of zero‐sum game equilibrium points. In Reference 24, authors applied measurable data of the system to online search the Nash policies with an adaptive evaluation structure. In Reference 25, an IRL scheme was proposed to study the zero‐sum Nash equilibrium online, which enhances the offline learning ability.…”
Section: Introductionmentioning
confidence: 99%
“…The dual input zero‐sum system was optimized with ADP structure in Reference 23, and it proved the existence of zero‐sum game equilibrium points. In Reference 24, authors applied measurable data of the system to online search the Nash policies with an adaptive evaluation structure. In Reference 25, an IRL scheme was proposed to study the zero‐sum Nash equilibrium online, which enhances the offline learning ability.…”
Section: Introductionmentioning
confidence: 99%
“…In [33], an online adaptive robust dynamic programming algorithm using policy iteration scheme for ZS-TP-G of continuous-time unknown systems subject to uncertainties was considered. In [34], a data-based adaptive critic method using output feedback for unknown model and system states was described under disturbance measurement assumption. In [35], a data-based policy iteration Q-learning algorithm for ZS-TP-G was developed for linear systems to eliminate process dynamics knowledge.…”
Section: Introductionmentioning
confidence: 99%
“…These IO samples are subsequently used to build a so-called virtual state that defines a virtual state-space model transformation of the original system. Unfortunately, the approach has been tackled for linear systems only, in a number of recent works [34], [41], [42] and not for general nonlinear systems, to the authors' best knowledge. This last remark serves as an incentive to one of this work's main contributions.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a few ADP‐based algorithms have been proposed to solve the HJI equations and GAREs of the discrete‐time dynamic systems without knowing system dynamic matrices . In , the Q‐learning technique was used to find the optimal strategies for discrete‐time linear system quadratic zero‐sum games related to the H ∞ optimal control problem.…”
Section: Introductionmentioning
confidence: 99%
“…In , a model‐free H ∞ control design algorithm was presented to solve the GARE for unknown linear discrete‐time systems via Q‐learning and LMI. In , a novel data‐based adaptive critic design using output feedback was proposed for discrete‐time zero‐sum games, in which neither the knowledge of system model nor information of system states are required. In , an iterative approach to approximate the HJI equation using a neural network (NN) was presented and the requirement of full knowledge of the internal dynamics of the nonlinear DT system was relaxed through a second NN online approximator.…”
Section: Introductionmentioning
confidence: 99%