ME‐MADDPG: An efficient learning‐based motion planning method for multiple agents in complex environments

Wan, Kaifang; Wu, Dingwei; Li, Bo; Gao, Xiaoguang; Hu, Zijian; Chen, Daqing

doi:10.1002/int.22778

Cited by 22 publications

(3 citation statements)

References 28 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has been a considerable extension to the MADDPG paradigm where recurrent DPGs were used for complex environments like Cognitive Electronic Warfare [18] and Partially Observable Environments for Communication systems [19]. A mixed environment approach was taken for complex environments using MADDPG [20], whereas Decomposed Approach was introduced for learning multi-agent policies for UAV clusters to build a connected communication network [21]. Further, these set of algorithms were also discussed in the context of working with smart grids for edge technology [22] and have shown to perform considerably well when compared to the other state-of-the-art.…”

Section: B Multi-agent Deep Deterministic Policy Gradients (With Prio...mentioning

confidence: 99%

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Mittal¹,

Malte²

2023

Preprint

View full text Add to dashboard Cite

Multi-Agent RL or MARL is one of the complex problems in Autonomous Driving literature that hampers the release of fully-autonomous vehicles today. Several simulators have been in iteration after their inception to mitigate the problem of complex scenarios with multiple agents in Autonomous Driving. One such simulator-SMARTS, discusses the importance of cooperative multi-agent learning. For this problem, we discuss two approaches-MAPPO and MADDPG, which are based onpolicy and off-policy RL approaches. We compare our results with the state-of-the-art results for this challenge and discuss the potential areas of improvement while discussing the explainability of these approaches in conjunction with waypoints in the SMARTS environment.

show abstract

Section: B Multi-agent Deep Deterministic Policy Gradients (With Prio...mentioning

confidence: 99%

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Mittal¹,

Malte²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The CM relies on Reinforcement Learning (RL)-based methods that use iterative algorithms to converge in an optimal navigation policy [30][31][32]. It is common in MAS to use methods based on Deep Reinforcement Learning (DRL), which is a powerful tool that combines neural networks and RL algorithms that allow each agent to learn from its interactions with the environment [33][34][35][36][37][38]. Despite the effectiveness of the RL-based methods, the main disadvantage in MAS is the computational complexity and abundance of data required to converge to the global policy.…”

Section: Introductionmentioning

confidence: 99%

Decentralized Navigation with Optimality for Multiple Holonomic Agents in Simply Connected Workspaces

Kotsinis,

Bechlioulis

2024

Sensors

View full text Add to dashboard Cite

Multi-agent systems are utilized more often in the research community and industry, as they can complete tasks faster and more efficiently than single-agent systems. Therefore, in this paper, we are going to present an optimal approach to the multi-agent navigation problem in simply connected workspaces. The task involves each agent reaching its destination starting from an initial position and following an optimal collision-free trajectory. To achieve this, we design a decentralized control protocol, defined by a navigation function, where each agent is equipped with a navigation controller that resolves imminent safety conflicts with the others, as well as the workspace boundary, without requesting knowledge about the goal position of the other agents. Our approach is rendered sub-optimal, since each agent owns a predetermined optimal policy calculated by a novel off-policy iterative method. We use this method because the computational complexity of learning-based methods needed to calculate the global optimal solution becomes unrealistic as the number of agents increases. To achieve our goal, we examine how much the yielded sub-optimal trajectory deviates from the optimal one and how much time the multi-agent system needs to accomplish its task as we increase the number of agents. Finally, we compare our method results with a discrete centralized policy method, also known as a Multi-Agent Poli-RRT* algorithm, to demonstrate the validity of our method when it is attached to other research algorithms.

show abstract

“…Trajectory generation methods which are rooted in PDE have the advantage that they are easy to implement, entirely interpretable, and can provide theoretical guarantees regarding optimality, robustness, and other concerns. This is to distinguish them from sampling and learning based algorithms (for example [15,16,17,18,19]) which often sacrifice interpretability for efficiency. The main drawbacks of the PDEbased methods are the lack of efficiency and scalability.…”

Section: Introductionmentioning

confidence: 99%

Time-Optimal Paths for Simple Cars with Moving Obstacles in the Hamilton-Jacobi Formulation

Parkinson

Ceccia

2022

2022 American Control Conference (ACC)

View full text Add to dashboard Cite

We present a partial-differential-equation-based optimal path-planning framework for curvature constrained motion, with application to vehicles in 2-and 3-spatialdimensions. This formulation relies on optimal control theory, dynamic programming, and a Hamilton-Jacobi-Bellman equation. Many authors have developed similar models and work employed grid-based numerical methods to solve the partial differential equation required to generate optimal trajectories. However, these methods can be inefficient and do not scale well to high dimensions. We describe how efficient and scalable algorithms for solutions of high dimensional Hamilton-Jacobi equations can be developed to solve similar problems very efficiently, even in high dimensions, while maintaining the Hamilton-Jacobi formulation. We demonstrate our method with several examples.

show abstract

ME‐MADDPG: An efficient learning‐based motion planning method for multiple agents in complex environments

Cited by 22 publications

References 28 publications

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment

Decentralized Navigation with Optimality for Multiple Holonomic Agents in Simply Connected Workspaces

Time-Optimal Paths for Simple Cars with Moving Obstacles in the Hamilton-Jacobi Formulation

Contact Info

Product

Resources

About