Hybrid Policy Learning for Multi-Agent Pathfinding

Skrynnik, Alexey; Yakovleva, Alexandra; Davydov, Vasilii; Yakovlev, Konstantin; Panov, Aleksandr I.

doi:10.1109/access.2021.3111321

Cited by 14 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, MAPPER integrates an evolutionary algorithm to enhance the refinement of agent policies. Another example of communicationfree approaches can be seen in studies by Skrynnik et al [69], [160], [161].…”

Section: ) Methodology Detailsmentioning

confidence: 99%

“…Open Question 9: How can effectively learned implicit communication minimize the need and overhead of explicit communication while achieving comparable outcomes? Studies such as [69], [161] demonstrate that state-of-theart decentralized behavior can be achieved by utilizing only local observations per agent without the need for explicit communication.…”

Section: ) Challenges and Open Questions A: Communicationmentioning

confidence: 99%

“…As shown in Table 1, most RL-based approaches depend on complex reward shaping schemes currently necessary for stable learning. However, it is also appealing to learn effective policies using a simple reward structure, like AlphaZero [51] and its adaptation to decentralization MAPF [161], eliminating manual parameter tuning.…”

Section: ) Challenges and Open Questions A: Communicationmentioning

confidence: 99%

See 2 more Smart Citations

A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding

Alkazzi,

Okumura

2024

IEEE Access

View full text Add to dashboard Cite

This review paper provides an in-depth analysis of the latest advancements in applying Machine Learning (ML) to solve the Multi-Agent Path Finding (MAPF) problem. The MAPF problem is about finding collision-free paths for multiple agents to travel from their source to goal locations in a known environment. This method underpins a range of advanced, large-scale automated systems, notably in warehouse logistics. The existing research on conventional MAPF is extensive; however, recent developments in ML have notably augmented the capabilities of MAPF techniques. This research seeks to thoroughly investigate the emerging field focused on using ML to help solve the MAPF problem. It aims to highlight the transformative potential of ML in enhancing the efficiency and effectiveness of multi-agent systems in navigating and coordinating in complex environments. Our study comprehensively examines the entire MAPF process, encompassing environment representation, path planning, and solution execution.

show abstract

Section: ) Methodology Detailsmentioning

confidence: 99%

Section: ) Challenges and Open Questions A: Communicationmentioning

confidence: 99%

Section: ) Challenges and Open Questions A: Communicationmentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding

Alkazzi,

Okumura

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Target matrix : if the agent’s goal is inside the observation field, then there is 1 in the cell, where it is located, and 0 in other cells. If the target does not fall into the view, then it is projected onto the nearest cell of the observation field ( Skrynnik et al., 2021 ).…”

Section: Methodsmentioning

confidence: 99%

“…It is also worth noting a direction of research where planning algorithms are combined with reinforcement learning. In Skrynnik et al (2021) and Davydov et al (2021) , the authors train RL agents in a centralized (QMIX) and decentralized (PPO) way for solving multi-agent pathfinding tasks. The resulting RL policies are combined with a planning approach (MCTS), which leverages the resulting performance.…”

Section: Related Workmentioning

confidence: 99%

Pathfinding in stochastic environments: learning vs planning

Skrynnik

Andreychuk²,

Yakovlev

et al. 2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

Among the main challenges associated with navigating a mobile robot in complex environments are partial observability and stochasticity. This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time. Moreover, we consider the case when the environment is only partially observable for an agent. We study and evaluate two orthogonal approaches to tackle the problem of reaching the goal under such conditions: planning and learning. Within planning, an agent constantly re-plans and updates the path based on the history of the observations using a search-based planner. Within learning, an agent asynchronously learns to optimize a policy function using recurrent neural networks (we propose an original efficient, scalable approach). We carry on an extensive empirical evaluation of both approaches that show that the learning-based approach scales better to the increasing number of the unpredictably appearing/disappearing obstacles. At the same time, the planning-based one is preferable when the environment is close-to-the-deterministic (i.e., external disturbances are rare). Code available at https://github.com/Tviskaron/pathfinding-in-stochastic-envs.

show abstract

Reinforcement Learning with Success Induced Task Prioritization

Nesterova

Skrynnik

Panov

2022

Advances in Computational Intelligence

Self Cite

View full text Add to dashboard Cite

Many challenging reinforcement learning (RL) problems require designing a distribution of tasks that can be applied to train effective policies. This distribution of tasks can be specified by the curriculum. A curriculum is meant to improve the results of learning and accelerate it. We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning, where a task sequence is created based on the success rate of each task. In this setting, each task is an algorithmically created environment instance with a unique configuration. The algorithm selects the order of tasks that provide the fastest learning for agents. The probability of selecting any of the tasks for the next stage of learning is determined by evaluating its performance score in previous stages. Experiments were carried out in the Partially Observable Grid Environment for Multiple Agents (POGEMA) and Procgen benchmark. We demonstrate that SITP matches or surpasses the results of other curriculum design methods. Our method can be implemented with handful of minor modifications to any standard RL framework and provides useful prioritization with minimal computational overhead.

show abstract

Hybrid Policy Learning for Multi-Agent Pathfinding

Cited by 14 publications

References 30 publications

A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding

A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding

Pathfinding in stochastic environments: learning vs planning

Reinforcement Learning with Success Induced Task Prioritization

Contact Info

Product

Resources

About