Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model

Gao, Weinan; Mynuddin, Mohammed; Wunsch, Donald C.; Jiang, Zhong Ping

doi:10.1109/tnnls.2021.3069728

Cited by 57 publications

(14 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the increasing accessibility of low-cost, high-performance computing technology, DRL has been effectively applied to various areas. With the help of neural networks as function approximators, DRL can handle large dimensions of state or action space [ 25 , 26 , 27 , 28 , 29 , 30 ], which is the case with autonomous vehicle platoons [ 31 ]. Using a model-free DRL algorithm eliminates the need to model the environment’s complicated dynamics (the transition function/probability distribution).…”

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

Comparative Study of Cooperative Platoon Merging Control Based on Reinforcement Learning

Irshayyid

Chen

2023

Sensors

View full text Add to dashboard Cite

The time that a vehicle merges in a lane reduction can significantly affect passengers’ safety, comfort, and energy consumption, which can, in turn, affect the global adoption of autonomous electric vehicles. In this regard, this paper analyzes how connected and automated vehicles should cooperatively drive to reduce energy consumption and improve traffic flow. Specifically, a model-free deep reinforcement learning approach is used to find the optimal driving behavior in the scenario in which two platoons are merging into one. Several metrics are analyzed, including the time of the merge, energy consumption, and jerk, etc. Numerical simulation results show that the proposed framework can reduce the energy consumed by up to 76.7%, and the average jerk can be decreased by up to 50%, all by only changing the cooperative merge behavior. The present findings are essential since reducing the jerk can decrease the longitudinal acceleration oscillations, enhance comfort and drivability, and improve the general acceptance of autonomous vehicle platooning as a new technology.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

“…An RL algorithm can be formed as a Markov decision process (MDP) [ 25 , 28 , 29 , 30 ], a statistical technique that samples from a complicated distribution and estimates its characteristics. MDP is used to choose the appropriate action given a complete set of observations [ 52 ].…”

Section: Preliminary Study On Reinforcement Learningmentioning

confidence: 99%

Comparative Study of Cooperative Platoon Merging Control Based on Reinforcement Learning

Irshayyid

Chen

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…These methods do not require complete knowledge of the system model; rather, they use information about the state, input, and output of each agent to establish feasibility of solutions and guarantees on convergence of reinforcement learning-based algorithms. Input and state data were used in an online manner to design a distributed control algorithm to solve a cooperative optimal output regulation problem in leader-follower systems in [30]. Information obtained from trajectories of each player were used in [31] to develop real-time solutions to multi-player games through the design of an actor-critic-based adaptive learning algorithm.…”

Section: Related Workmentioning

confidence: 99%

Shaping Advice in Deep Reinforcement Learning

Xiao¹,

Ramasubramanian²,

Poovendran³

2022

Preprint

View full text Add to dashboard Cite

Reinforcement learning involves agents interacting with an environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose methods to augment the reward signal from the environment with an additional reward termed shaping advice in both single-and multi-agent reinforcement learning. The shaping advice is specified as a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The use of potential functions is underpinned by an insight that the total potential when starting from any state and returning to the same state is always equal to zero. We show through theoretical analyses and experimental validation that the shaping advice does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that the convergence of policy gradients and value functions when using shaping advice implies the convergence of these quantities in the absence of shaping advice. We design two algorithms-Shaping Advice in Single-agent reinforcement learning (SAS) and Shaping Advice in Multi-agent reinforcement learning (SAM). Shaping advice in SAS and SAM needs to be specified only once at the start of training, and can easily be provided by non-experts. Experimentally, we evaluate SAS and SAM on two tasks in single-agent environments and three tasks in multi-agent environments that have sparse rewards. We observe that using shaping advice results in agents learning policies to complete tasks faster, and obtain higher rewards than algorithms that do not use shaping advice. Code for our experiments is available at https://github.com/baicenxiao/Shaping-Advice.

show abstract

“…In [23], a model-free RL-based method is proposed to design a suboptimal adaptive controllers for linear continuoustime multi-agent systems, in which the leader's system matrix information is required to design the controller for each follower and the controller gains are dependent on the communication graph. Different from [23], [25] proposes an effective optimal algorithm for discrete-time multi-agent systems, where the leader's system matrix information is required in designing each follower's controller and the modulus of each eigenvalue of leader's system matrix is required to be equal to 1; moreover, the communication graph is required to be acylic, which means that the communication graph is a digraph with no loops, [26] proposes a RL-based algorithm to solve the linear continuous-time COORP without requiring the knowledege of followers' system models, while the knowledge of leader's system model is required for each follower, and the eigenvalues of each follower need to be simple with zero real parts. In [27], distributed observers and adaptive controllers are designed for each follower to solve the COORP of nonlinear continuous-time multi-agent systems with unity relative degree, in which the exosystem is required to be stable, and all the eigenvalues of each follower are required to be semi-simple with zero real parts.…”

Section: Introductionmentioning

confidence: 99%

Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning

2023

Preprint

View full text Add to dashboard Cite

In this paper, an off-policy model-free algorithm is presented for solving the cooperative optimal output regulation problem for linear discrete-time multi-agent systems. First, an adaptive distributed observer is designed for each follower to estimate the leader's information. Then, a distributed feedback-feedforward controller is developed for each follower to solve the cooperative optimal output regulation problem utilizing the follower's state information and the adaptive distributed observer. Based on reinforcement learning method, an adaptive algorithm is presented to find the optimal feedback gains via online data collecting from system trajectory. By designing a Sylvester map, the solution to the regulator equations is calculated via data collected from the optimal feedback gain design steps, and the feedforward control gain is found. Finally, an off-policy model-free algorithm is proposed to design the distributed feedback-feedforward controller for each follower to solve the cooperative optimal output regulation problem. A numerical example is given to verify the effectiveness of this proposed approach.

show abstract

Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model

Cited by 57 publications

References 51 publications

Comparative Study of Cooperative Platoon Merging Control Based on Reinforcement Learning

Comparative Study of Cooperative Platoon Merging Control Based on Reinforcement Learning

Shaping Advice in Deep Reinforcement Learning

Model-Free Cooperative Optimal Output Regulation for Linear Discrete-Time Multi-Agent Systems Using Reinforcement Learning

Contact Info

Product

Resources

About