Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training

Sharma, Piyush; Fernández, R. Castillo; Zaroukian, Erin; Dorothy, Michael; Basak, Anjon; Asher, Derrik E.

doi:10.1117/12.2585808

Cited by 24 publications

(17 citation statements)

References 29 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This helps to achieve high-level information about system dynamics without getting trapped in the difficulties of coordinating information flow between multiple learners. Centralized learning and training can be integrated with either a central decision maker or decentralized excitation [184]. In the second category, the centralized learner learns the value function using the criteria for guiding distributed actors [185].…”

Section: Learning Mechanismmentioning

confidence: 99%

A Survey of Adaptive Multi-Agent Networks and Their Applications in Smart Cities

Nezamoddini

Gholami

2022

Smart Cities

View full text Add to dashboard Cite

The world is moving toward a new connected world in which millions of intelligent processing devices communicate with each other to provide services in transportation, telecommunication, and power grids in the future’s smart cities. Distributed computing is considered one of the efficient platforms for processing and management of massive amounts of data collected by smart devices. This can be implemented by utilizing multi-agent systems (MASs) with multiple autonomous computational entities by memory and computation capabilities and the possibility of message-passing between them. These systems provide a dynamic and self-adaptive platform for managing distributed large-scale systems, such as the Internet-of-Things (IoTs). Despite, the potential applicability of MASs in smart cities, very few practical systems have been deployed using agent-oriented systems. This research surveys the existing techniques presented in the literature that can be utilized for implementing adaptive multi-agent networks in smart cities. The related literature is categorized based on the steps of designing and controlling these adaptive systems. These steps cover the techniques required to define, monitor, plan, and evaluate the performance of an autonomous MAS. At the end, the challenges and barriers for the utilization of these systems in current smart cities, and insights and directions for future research in this domain, are presented.

show abstract

Section: Learning Mechanismmentioning

confidence: 99%

A Survey of Adaptive Multi-Agent Networks and Their Applications in Smart Cities

Nezamoddini

Gholami

2022

Smart Cities

View full text Add to dashboard Cite

show abstract

“…Information fusion is a widely studied topic in robotics for decades [5]. As opposed to the common paradigm of controlling multiple robots using a centralised controller [6] or fusing data from various sources [7], our objective is to gather data from individually operating robots each with a different skill to train a single robot that possesses diverse skills appropriately learned from all other robots by fusing knowledge. Roboticists have attempted to perform knowledge fusion at the perception stage or decision-making stage of the robot autonomy stack.…”

Section: Related Workmentioning

confidence: 99%

Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse Skills

Tan¹,

Senanayake²,

Ramos³

2022

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (RL) is a promising approach to solving complex robotics problems. However, the process of learning through trial-and-error interactions is often highly time-consuming, despite recent advancements in RL algorithms. Additionally, the success of RL is critically dependent on how well the reward-shaping function suits the task, which is also time-consuming to design. As agents trained on a variety of robotics problems continue to proliferate, the ability to reuse their valuable learning for new domains becomes increasingly significant. In this paper, we propose a post-hoc technique for policy fusion using Optimal Transport theory as a robust means of consolidating the knowledge of multiple agents that have been trained on distinct scenarios. We further demonstrate that this provides an improved weights initialisation of the neural network policy for learning new tasks, requiring less time and computational resources than either retraining the parent policies or training a new policy from scratch. Ultimately, our results on diverse agents commonly used in deep RL show that specialised knowledge can be unified into a "Renaissance agent", allowing for quicker learning of new skills.

show abstract

“…In this subsection, we propose a distributed ZOO algorithm with asynchronous sample and update schemes based on the BCD algorithm (12) and the gradient approximation for each agent i. According to (16), we have the following approximation for each agent i at step k:…”

Section: Distributed Zoo Algorithm With Asynchronous Samplingsmentioning

confidence: 99%

“…Multi-agent networks are one of the most representative systems that have broad applications and usually induce largesize optimization problems [12]. In recent years, distributed zeroth-order convex and non-convex optimizations on multi-agent networks have been extensively studied, e.g., [13]- [17], all of which decompose the original cost function into multiple functions and assign them to the agents.…”

Section: Introductionmentioning

confidence: 99%

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Jing¹,

Bai²,

George³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.

show abstract

Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training

Cited by 24 publications

References 29 publications

A Survey of Adaptive Multi-Agent Networks and Their Applications in Smart Cities

A Survey of Adaptive Multi-Agent Networks and Their Applications in Smart Cities

Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse Skills

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Contact Info

Product

Resources

About