2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020
DOI: 10.1109/iros45743.2020.9340876
|View full text |Cite
|
Sign up to set email alerts
|

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(29 citation statements)
references
References 20 publications
0
29
0
Order By: Relevance
“…Table 1 summarizes the DRL multi-robot path planning methods and the advantages and limitations of each method. From the information in Table 1, it can be summarized that shared parameter type algorithms such as MADDPG and ME-MADDPG can be used in dynamic and complex environments [1][2][3][4] ; decentralized architectures such as DQN and DDQN can be considered in stable environments [5][6][7] ; large robotic systems facing a large number of dynamic obstacles can be considered using algorithms such as A2C, A3C and TDueling [8][9][10][11] . Validity validated on only a few teams of agents.…”
Section: Drl Multi-robot Path Planning Methodsmentioning
confidence: 99%
“…Table 1 summarizes the DRL multi-robot path planning methods and the advantages and limitations of each method. From the information in Table 1, it can be summarized that shared parameter type algorithms such as MADDPG and ME-MADDPG can be used in dynamic and complex environments [1][2][3][4] ; decentralized architectures such as DQN and DDQN can be considered in stable environments [5][6][7] ; large robotic systems facing a large number of dynamic obstacles can be considered using algorithms such as A2C, A3C and TDueling [8][9][10][11] . Validity validated on only a few teams of agents.…”
Section: Drl Multi-robot Path Planning Methodsmentioning
confidence: 99%
“…The agent learns to choose its actions according to the desired goals by receiving appropriate rewards for its behaviour in the environment. While reinforcement learning often uses simple and sparse rewards, a reward composed of multiple components allows for better adaptation of the learned policy to specific goals of the path planning problem, as seen for example in [ 16 , 17 ]. Therefore, the reward function is designed to discourage the agent from generating routes that are longer than necessary, encourage one-way drives across a single cell, and keep the number of turns on each path and intersections between successive routes low.…”
Section: Reinforcement-learning-based Route Generationmentioning
confidence: 99%
“…For settings where multiple robots share their movement space, RL approaches are used in a multi-agent form. In both [ 15 , 16 ], multiple RL agents simultaneously plan their local paths in a distributed setting. The first research incorporates expert knowledge into the learning phase by using imitation learning (IL), while the second improves the convergence to the optimal policy using an evolutionary training approach.…”
Section: Introductionmentioning
confidence: 99%
“…Later on, Liu et al [12] proposed the Mapper, which is the baseline method of our study, for MAPF under the DTDE architecture. In this method, each agent models the behavior of dynamic obstacles based on the image-based representation and then inputs local observations into the respective actor and critic networks for learning.…”
Section: A Mapf Based On Deep Reinforcement Learningmentioning
confidence: 99%
“…Most of the above communication-free methods are based on DTDE framework. In this framework,each agent plans actions based only on its sequence of observations and its policy [12], which is feasible in a non-crowded environment where each agent's decision does not have to consider other agents because the combination of the optimal individual actions is the optimal joint action, and there is no need to consider the communication between the agents, which makes the distributed approach highly efficient [9]. However, the interaction between the agent and the dynamic crowded environment makes the agent have to overcome the problem of low stability and poor robustness of the planned strategies due to the non-stationarity of the environment [13].…”
Section: A Mapf Based On Deep Reinforcement Learningmentioning
confidence: 99%