Discrete-time dynamic graphical games: model-free reinforcement learning solution

Abouheaf, Mohammed I.; Lewis, Frank L.; Mahmoud, Magdi S.; Mikulski, Dariusz

doi:10.1007/s11768-015-3203-x

Cited by 60 publications

(34 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To solve zero-sum differential games, Mehraeen et al [12], Sun et al [13,14], and Zhu et al [15] used iterative approach to approximate the Hamilton-Jacobi-Isaacs equation with neural network. On the other hand, Abouheaf and Lewis et al [16,17] applied policy iteration algorithm to learn Nash solution for multiplayer cooperative games. As for constrained problems, inspired by the form of optimal cost-to-go in [18], a new value function involving Lagrange multipliers was introduced by Heydari and Balakrishnan [19] to handle terminal constraints; Kim [20] also successfully applied this idea to the spacecraft's finite-horizon control, while Adhyaru et al [21] and Xu et al [22] used nonquadratic term in the performance function to deal with magnitude constraints on the control input.…”

Section: Introductionmentioning

confidence: 99%

Finite‐Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming

Wan

Chang

et al. 2019

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Referring to the optimal tracking guidance of aircraft, the conventional time based kinematics model is transformed into a downrange based model by independent variable replacement. The deviations of in-flight altitude and flight path angle are penalized and corrected to achieve high precision tracking of reference trajectory. The tracking problem is solved as a linear quadratic regulator applying small perturbation theory, and the approximate dynamic programming method is used to cope with the solving of finite-horizon optimization. An actor-critic structure is established to approximate the optimal tracking controller and minimum cost function. The least squares method and Adam optimization algorithm are adopted to learn the parameters of critic network and actor network, respectively. A boosting trajectory with maximum final velocity is generated by Gauss pseudospectral method for the validation of guidance strategy. The results show that the trained feedback control parameters can effectively resist random wind disturbance, correct the initial altitude and flight path angle deviations, and achieve the goal of following a given trajectory.

show abstract

Section: Introductionmentioning

confidence: 99%

Finite‐Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming

Wan

Chang

et al. 2019

Mathematical Problems in Engineering

View full text Add to dashboard Cite

show abstract

“…Typical optimal control methods tend to solve the underlying Hamilton-Jacobi-Bellman (HJB) equation of the dynamical system by applying the optimality principles [22,23]. An optimal control problem is usually formulated as an optimization problem with a cost function that identifies the optimization objectives and a mathematical process to find the respective optimal strategies [6,7,18,[22][23][24][25][26][27][28]. To implement the optimal control solutions stemming from the ADP approaches, numerous solving frameworks are considered based on combinations of Reinforcement Learning (RL) and adaptive critics [1,5,18,25,27].…”

Section: Introductionmentioning

confidence: 99%

“…An actor-critic solution framework is adopted for an online policy iteration process with a weighted-derivative performance index form in [33]. A model-free optimal solution for graphical games is implemented using only one critic structure for each agent in [25]. The recent state-of-the-art adaptive critics implementations for numerous reinforcement learning solutions for the feedback control problems are surveyed in [36].…”

Section: Introductionmentioning

confidence: 99%

Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft

2018

View full text Add to dashboard Cite

Classical gradient-based approximate dynamic programming approaches provide reliable and fast solution platforms for various optimal control problems. However, their dependence on accurate modeling approaches poses a major concern, where the efficiency of the proposed solutions are severely degraded in the case of uncertain dynamical environments. Herein, a novel online adaptive learning framework is introduced to solve action-dependent dual heuristic dynamic programming problems. The approach does not depend on the dynamical models of the considered systems. Instead, it employs optimization principles to produce model-free control strategies. A policy iteration process is employed to solve the underlying Hamilton–Jacobi–Bellman equation using means of adaptive critics, where a layer of separate actor-critic neural networks is employed along with gradient descent adaptation rules. A Riccati development is introduced and shown to be equivalent to solving the underlying Hamilton–Jacobi–Bellman equation. The proposed approach is applied on the challenging weight shift control problem of a flexible wing aircraft. The continuous nonlinear deformation in the aircraft’s flexible wing leads to various aerodynamic variations at different trim speeds, which makes its auto-pilot control a complicated task. Series of numerical simulations were carried out to demonstrate the effectiveness of the suggested strategy.

show abstract

“…Approximate Dynamic Programming (ADP) approaches are used to find approximate solutions for the Dynamic Programming problems in [2], [5], [6]. These approaches combine knowledge from Dynamic Programming, Reinforcement Learning (RL), and Adaptive Critics [2]- [8]. ADP approaches are used in the cooperative control, computational intelligence, decision making, and applied mathematics [9], [10].…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning Solution with Costate Approximation for a Flexible Wing Aircraft

Abouheaf

Gueaieb

2018

2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applicati

View full text Add to dashboard Cite

Abstract-An online adaptive learning approach based on costate function approximation is developed to solve an optimal control problem in real time. The proposed approach tackles the main concerns associated with the classical Dual Heuristic Dynamic Programming techniques in uncertain dynamical environments. It employs a policy iteration paradigm along with adaptive critics to implement the adaptive learning solution. The resultant framework does not need or require prior knowledge of the system dynamics, which makes it suitable for systems with high modeling uncertainties. As a proof of concept, the suggested structure is applied for the auto-pilot control of a flexible wing aircraft with unknown dynamics which are continuously varying at each trim speed condition. Numerical simulations showed that the adaptive control technique was able to learn the system's dynamics and regulate its states as desired in a relatively short time.

show abstract

Discrete-time dynamic graphical games: model-free reinforcement learning solution

Cited by 60 publications

References 30 publications

Finite‐Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming

Finite‐Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming

Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft

Reinforcement Learning Solution with Costate Approximation for a Flexible Wing Aircraft

Contact Info

Product

Resources

About