2015
DOI: 10.1007/s11768-015-3203-x
|View full text |Cite
|
Sign up to set email alerts
|

Discrete-time dynamic graphical games: model-free reinforcement learning solution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
34
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 60 publications
(34 citation statements)
references
References 30 publications
0
34
0
Order By: Relevance
“…To solve zero-sum differential games, Mehraeen et al [12], Sun et al [13,14], and Zhu et al [15] used iterative approach to approximate the Hamilton-Jacobi-Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al [16,17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal cost-to-go in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft's finite-horizon control, while Adhyaru et al [21] and Xu et al [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input.…”
Section: Introductionmentioning
confidence: 99%
“…To solve zero-sum differential games, Mehraeen et al [12], Sun et al [13,14], and Zhu et al [15] used iterative approach to approximate the Hamilton-Jacobi-Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al [16,17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal cost-to-go in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft's finite-horizon control, while Adhyaru et al [21] and Xu et al [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input.…”
Section: Introductionmentioning
confidence: 99%
“…Typical optimal control methods tend to solve the underlying Hamilton-Jacobi-Bellman (HJB) equation of the dynamical system by applying the optimality principles [22,23]. An optimal control problem is usually formulated as an optimization problem with a cost function that identifies the optimization objectives and a mathematical process to find the respective optimal strategies [6,7,18,[22][23][24][25][26][27][28]. To implement the optimal control solutions stemming from the ADP approaches, numerous solving frameworks are considered based on combinations of Reinforcement Learning (RL) and adaptive critics [1,5,18,25,27].…”
Section: Introductionmentioning
confidence: 99%
“…An actor-critic solution framework is adopted for an online policy iteration process with a weighted-derivative performance index form in [33]. A model-free optimal solution for graphical games is implemented using only one critic structure for each agent in [25]. The recent state-of-the-art adaptive critics implementations for numerous reinforcement learning solutions for the feedback control problems are surveyed in [36].…”
Section: Introductionmentioning
confidence: 99%
“…Approximate Dynamic Programming (ADP) approaches are used to find approximate solutions for the Dynamic Programming problems in [2], [5], [6]. These approaches combine knowledge from Dynamic Programming, Reinforcement Learning (RL), and Adaptive Critics [2]- [8]. ADP approaches are used in the cooperative control, computational intelligence, decision making, and applied mathematics [9], [10].…”
Section: Introductionmentioning
confidence: 99%