Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Yang, Yongliang; Zhang, Sen; Dong, Jie; Yin, Yixin

doi:10.1109/access.2019.2960064

Cited by 6 publications

(11 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Various RL algorithms [15][16][17][18] exhibit diverse computational complexities, leading to varying demands for computational power. Therefore, it becomes imperative to study the method that can reduce the computational power requirements of algorithms without compromising their performance.…”

Section: Introductionmentioning

confidence: 99%

“…The third type is the actor-critic (AC) structure, which is also the most widely used. For example, [16][17][18] respectively solved the NZS problem of different controlled systems. There is also a synchronous RL method [28,29] based on the AC structure, which can continuously and simultaneously adjust the weights of actor NN and critic NN.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Guo,

Xiong,

Song

et al. 2023

IET Control Theory & Appl

View full text Add to dashboard Cite

To reduce the learning time and space occupation, this study presents a novel model‐free algorithm for obtaining the Nash equilibrium solution of continuous‐time nonlinear non‐zero‐sum games. Based on the integral reinforcement learning method, a new integral HJ equation that can quickly and cooperatively determine the Nash equilibrium strategies of all players is proposed. By leveraging the neural network approximation and gradient descent method, simultaneous continuous‐time adaptive tuning laws are provided for both critic and actor neural network weights. These laws facilitate the estimation of the optimal value function and optimal policy without requiring knowledge or identification of the system's dynamics. The closed‐loop system stability and convergence of weights are guaranteed through the Lyapunov analysis. Additionally, the algorithm is enhanced to reduce the number of auxiliary NNs used in the critic. The simulation results for a two‐player non‐zero‐sum game validate the effectiveness of the proposed algorithm.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Guo,

Xiong,

Song

et al. 2023

IET Control Theory & Appl

View full text Add to dashboard Cite

show abstract

“…Many practical application scenarios can be modeled as a multi‐input system which is controlled by multiple controllers 1,2 . From the perspective of game theory, 3‐5 the study of optimal control problems for multi‐control input systems has become a hotspot in control theory research 6‐10 . Based on the different roles and tasks of each control input, optimal control problem of multi‐input system can be divided into: fully cooperative games (FC), 11 zero‐sum (ZS) games, 12 and nonzero‐sum (NZS) games 13 .…”

Section: Introductionmentioning

confidence: 99%

“…1,2 From the perspective of game theory, [3][4][5] the study of optimal control problems for multi-control input systems has become a hotspot in control theory research. [6][7][8][9][10] Based on the different roles and tasks of each control input, optimal control problem of multi-input system can be divided into: fully cooperative games (FC), 11 zero-sum (ZS) games, 12 and nonzero-sum (NZS) games. 13 In fact, FC games can be regarded as a special case of the NZS games.…”

Section: Introductionmentioning

confidence: 99%

Off‐policy integral reinforcement learning‐based optimal tracking control for a class of nonzero‐sum game systems with unknown dynamics

Zhao

Chen

2022

Optim Control Appl Methods

View full text Add to dashboard Cite

This article studies the optimal tracking control problem of a class of multi‐input nonlinear system with unknown dynamics based on reinforcement learning (RL) and nonzero‐sum game theory. First of all, an augmented system composed of the tracking error dynamics and the command generator dynamics is constructed. Then, a tracking coupled Hamilton–Jacobi (HJ) equations associated with discounted cost function is derived, which gives the Nash equilibrium solution. The existence of Nash equilibrium is proved. To approximate the Nash equilibrium solution of tracking coupled HJ equations, we give two model‐based policy iteration (PI) algorithms, and analyze their equivalence and convergence. Further, to get rid of the prior knowledge of system dynamics, an off‐policy integral reinforcement learning (OP‐IRL) algorithm implemented by neural networks (NNs) is proposed. The weights of critic NNs and actor NNs are updated simultaneously by the gradient descent method. The convergence of the NNs weights and the stability of the closed‐loop error systems are proved. Finally, numerical simulation results are provided to demonstrate the effectiveness of the proposed OP‐IRL method.

show abstract

“…Based on the roles and tasks of inputs, the optimal control of multiple control input systems can be studied from three perspectives: zero-sum game (ZS), non-zero-sum (NZS) game and fully cooperative (FC) game [40]. For zero-sum games [41], [42] [43], [44] and non-zero-sum games [45], [46] [47], [48] [49], [50] [51], [52], scholars have developed many ADP methods. How, there are few ADP studies on fully cooperative games [53], [54].…”

Section: Introductionmentioning

confidence: 99%

Data-Driven Adaptive Dynamic Programming for Optimal Control of Continuous-Time Multicontroller Systems With Unknown Dynamics

Zhao

2022

IEEE Access

View full text Add to dashboard Cite

This paper investigates the optimal control of continuous-time multi-controller systems with completely unknown dynamics using data-driven adaptive dynamic programming (DD-ADP). In this investigation, all controllers take actions together as a team, and they have precisely the same cost function, which is actually a fully cooperative game. According to optimal control theory, the HJB equation corresponding to the fully cooperative game is derived. To obtain the solution to HJB equation, a modelbased policy iteration (PI) algorithm is first presented. On the basis of the PI algorithm, a DD-ADP algorithm without requiring the system dynamics is developed, and the neural networks (NNs) implementation scheme of the developed DD-ADP algorithm is given. Stability and convergence analysis are derived by Lyapunov theory. Finally, numerical simulation examples on linear and nonlinear multi-controller systems demonstrate the effectiveness of the designed scheme.INDEX TERMS Adaptive dynamic programming, fully cooperative games, neural networks, multicontroller systems, unknown dynamics.

show abstract

Data-Driven Nonzero-Sum Game for Discrete-Time Systems Using Off-Policy Reinforcement Learning

Cited by 6 publications

References 53 publications

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Off‐policy integral reinforcement learning‐based optimal tracking control for a class of nonzero‐sum game systems with unknown dynamics

Data-Driven Adaptive Dynamic Programming for Optimal Control of Continuous-Time Multicontroller Systems With Unknown Dynamics

Contact Info

Product

Resources

About