PRIMAL$_2$: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong

Damani, Mehul; Luo, Zhiyao; Wenzel, Emerson; Sartoretti, Guillaume

doi:10.1109/lra.2021.3062803

Cited by 82 publications

(62 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 summarizes the DRL multi-robot path planning methods and the advantages and limitations of each method. From the information in Table 1, it can be summarized that shared parameter type algorithms such as MADDPG and ME-MADDPG can be used in dynamic and complex environments [1][2][3][4] ; decentralized architectures such as DQN and DDQN can be considered in stable environments [5][6][7] ; large robotic systems facing a large number of dynamic obstacles can be considered using algorithms such as A2C, A3C and TDueling [8][9][10][11] . Validity validated on only a few teams of agents.…”

Section: Drl Multi-robot Path Planning Methodsmentioning

confidence: 99%

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Qiu

Cheng

2021

JERA

View full text Add to dashboard Cite

With the rapid advancement of deep reinforcement learning (DRL) in multi-agent systems, a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning (MADRL) are surfacing. Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently, and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence. In this paper, we sort out the training methods for multi-robot path planning, as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods; finally, we suggest possible research directions for researchers.

show abstract

Section: Drl Multi-robot Path Planning Methodsmentioning

confidence: 99%

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Qiu

Cheng

2021

JERA

View full text Add to dashboard Cite

show abstract

“…More recently, some works have attempted to leverage machine-learning techniques for solving MAPF. These techniques learn from planning demonstrations collected offline to directly predict the next actions of agents given the current observations by means of reinforcement learning [9,43,50] or using graph neural networks [38,39]. Despite such progress, it remains challenging to determine how these techniques should be applied to MAPP in continuous spaces due to the inherent limitation that assumes the search space to be given a priori (typically as a grid map).…”

Section: Related Workmentioning

confidence: 99%

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Okumura¹,

Yonetani²,

Nishimura³

et al. 2022

Preprint

View full text Add to dashboard Cite

Multi-agent path planning (MAPP) in continuous spaces is a challenging problem with significant practical importance. One promising approach is to first construct graphs approximating the spaces, called roadmaps, and then apply multi-agent pathfinding (MAPF) algorithms to derive a set of conflict-free paths. While conventional studies have utilized roadmap construction methods developed for single-agent planning, it remains largely unexplored how we can construct roadmaps that work effectively for multiple agents. To this end, we propose a novel concept of roadmaps called cooperative timed roadmaps (CTRMs). CTRMs enable each agent to focus on its important locations around potential solution paths in a way that considers the behavior of other agents to avoid inter-agent collisions (i.e., "cooperative"), while being augmented in the time direction to make it easy to derive a "timed" solution path. To construct CTRMs, we developed a machine-learning approach that learns a generative model from a collection of relevant problem instances and plausible solutions and then uses the learned model to sample the vertices of CTRMs for new, previously unseen problem instances. Our empirical evaluation revealed that the use of CTRMs significantly reduced the planning effort with acceptable overheads while maintaining a success rate and solution quality comparable to conventional roadmap construction approaches. †

show abstract

“…By contrast, some studies [5,8,9,24,25,37] have also considered an application to maze-like environments. For example, Damani et al [5] proposed pathfinding via reinforcement and imitation multiagent learning -lifelong (PRIMAL 2 ), a distributed reinforcement learning framework for a lifelong MAPF (LMAPF), which is a variant of the MAPF in which agents are repeatedly assigned new destinations. However, they assumed that tasks are sparsely generated at random locations, and thus, unlike our environment, no local congestion occurs.…”

Section: Related Workmentioning

confidence: 99%

Standby-Based Deadlock Avoidance Method for Multi-Agent Pickup and Delivery Tasks

Yamauchi¹,

Miyashita²,

Sugawara³

2022

Preprint

View full text Add to dashboard Cite

The multi-agent pickup and delivery (MAPD) problem, in which multiple agents iteratively carry materials without collisions, has received significant attention. However, many conventional MAPD algorithms assume a specifically designed grid-like environment, such as an automated warehouse. Therefore, they have many pickup and delivery locations where agents can stay for a lengthy period, as well as plentiful detours to avoid collisions owing to the freedom of movement in a grid. By contrast, because a maze-like environment such as a search-and-rescue or construction site has fewer pickup/delivery locations and their numbers may be unbalanced, many agents concentrate on such locations resulting in inefficient operations, often becoming stuck or deadlocked. Thus, to improve the transportation efficiency even in a maze-like restricted environment, we propose a deadlock avoidance method, called standbybased deadlock avoidance (SBDA). SBDA uses standby nodes determined in real-time using the articulation-point-finding algorithm, and the agent is guaranteed to stay there for a finite amount of time. We demonstrated that our proposed method outperforms a conventional approach. We also analyzed how the parameters used for selecting standby nodes affect the performance.

show abstract

PRIMAL$_2$: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong

Cited by 82 publications

References 28 publications

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

Standby-Based Deadlock Avoidance Method for Multi-Agent Pickup and Delivery Tasks

Contact Info

Product

Resources

About