MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Liu, Zuxin; Chen, Baiming; Zhou, Hongyi; Koushik, Guru; Hebert, Martial; Zhao, Ding

doi:10.1109/iros45743.2020.9340876

Cited by 43 publications

(29 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 summarizes the DRL multi-robot path planning methods and the advantages and limitations of each method. From the information in Table 1, it can be summarized that shared parameter type algorithms such as MADDPG and ME-MADDPG can be used in dynamic and complex environments [1][2][3][4] ; decentralized architectures such as DQN and DDQN can be considered in stable environments [5][6][7] ; large robotic systems facing a large number of dynamic obstacles can be considered using algorithms such as A2C, A3C and TDueling [8][9][10][11] . Validity validated on only a few teams of agents.…”

Section: Drl Multi-robot Path Planning Methodsmentioning

confidence: 99%

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Qiu

Cheng

2021

JERA

View full text Add to dashboard Cite

With the rapid advancement of deep reinforcement learning (DRL) in multi-agent systems, a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning (MADRL) are surfacing. Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently, and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence. In this paper, we sort out the training methods for multi-robot path planning, as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods; finally, we suggest possible research directions for researchers.

show abstract

Section: Drl Multi-robot Path Planning Methodsmentioning

confidence: 99%

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Qiu

Cheng

2021

JERA

View full text Add to dashboard Cite

show abstract

“…The agent learns to choose its actions according to the desired goals by receiving appropriate rewards for its behaviour in the environment. While reinforcement learning often uses simple and sparse rewards, a reward composed of multiple components allows for better adaptation of the learned policy to specific goals of the path planning problem, as seen for example in [ 16 , 17 ]. Therefore, the reward function is designed to discourage the agent from generating routes that are longer than necessary, encourage one-way drives across a single cell, and keep the number of turns on each path and intersections between successive routes low.…”

Section: Reinforcement-learning-based Route Generationmentioning

confidence: 99%

“…For settings where multiple robots share their movement space, RL approaches are used in a multi-agent form. In both [ 15 , 16 ], multiple RL agents simultaneously plan their local paths in a distributed setting. The first research incorporates expert knowledge into the learning phase by using imitation learning (IL), while the second improves the convergence to the optimal policy using an evolutionary training approach.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems

Kozjek

Malus

Vrabič

2021

Sensors

View full text Add to dashboard Cite

Autonomous mobile robots (AMRs) are increasingly used in modern intralogistics systems as complexity and performance requirements become more stringent. One way to increase performance is to improve the operation and cooperation of multiple robots in their shared environment. The paper addresses these problems with a method for off-line route planning and on-line route execution. In the proposed approach, pre-computation of routes for frequent pick-up and drop-off locations limits the movements of AMRs to avoid conflict situations between them. The paper proposes a reinforcement learning approach where an agent builds the routes on a given layout while being rewarded according to different criteria based on the desired characteristics of the system. The results show that the proposed approach performs better in terms of throughput and reliability than the commonly used shortest-path-based approach for a large number of AMRs operating in the system. The use of the proposed approach is recommended when the need for high throughput requires the operation of a relatively large number of AMRs in relation to the size of the space in which the robots operate.

show abstract

“…Later on, Liu et al [12] proposed the Mapper, which is the baseline method of our study, for MAPF under the DTDE architecture. In this method, each agent models the behavior of dynamic obstacles based on the image-based representation and then inputs local observations into the respective actor and critic networks for learning.…”

Section: A Mapf Based On Deep Reinforcement Learningmentioning

confidence: 99%

“…Most of the above communication-free methods are based on DTDE framework. In this framework,each agent plans actions based only on its sequence of observations and its policy [12], which is feasible in a non-crowded environment where each agent's decision does not have to consider other agents because the combination of the optimal individual actions is the optimal joint action, and there is no need to consider the communication between the agents, which makes the distributed approach highly efficient [9]. However, the interaction between the agent and the dynamic crowded environment makes the agent have to overcome the problem of low stability and poor robustness of the planned strategies due to the non-stationarity of the environment [13].…”

Section: A Mapf Based On Deep Reinforcement Learningmentioning

confidence: 99%

AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

Guan,

Gao,

Zhao

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-agent path finding in dynamic crowded environments is of great academic and practical value for multirobot systems in the real world. To improve the effectiveness and efficiency of communication and learning process during path planning in dynamic crowded environments, we introduce an algorithm called Attention and BicNet based Multiagent path planning with effective reinforcement (AB-Mapper) under the actor-critic reinforcement learning framework. In this framework, on the one hand, we utilize the BicNet with communication function in the actor-network to achieve intra team coordination. On the other hand, we propose a centralized critic network that can selectively allocate attention weights to surrounding agents. This attention mechanism allows an individual agent to automatically learn a better evaluation of actions by also considering the behaviours of its surrounding agents. Compared with the state-of-the-art method Mapper, our AB-Mapper is more effective (85.86% vs. 81.56% in terms of success rate) in solving the general path finding problems with dynamic obstacles. In addition, in crowded scenarios, our method outperforms the Mapper method by a large margin, reaching a stunning gap of more than 40% for each experiment.

show abstract

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Cited by 43 publications

References 20 publications

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems

AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

Contact Info

Product

Resources

About