Q-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid Environments

Davydov, Vasilii; Skrynnik, Alexey; Yakovlev, Konstantin; Panov, Aleksandr I.

doi:10.1007/978-3-030-86855-0_12

Cited by 2 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is also worth noting a direction of research where planning algorithms are combined with reinforcement learning. In Skrynnik et al (2021) and Davydov et al (2021) , the authors train RL agents in a centralized (QMIX) and decentralized (PPO) way for solving multi-agent pathfinding tasks. The resulting RL policies are combined with a planning approach (MCTS), which leverages the resulting performance.…”

Section: Related Workmentioning

confidence: 99%

Pathfinding in stochastic environments: learning vs planning

Skrynnik

Andreychuk²,

Yakovlev

et al. 2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

Among the main challenges associated with navigating a mobile robot in complex environments are partial observability and stochasticity. This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time. Moreover, we consider the case when the environment is only partially observable for an agent. We study and evaluate two orthogonal approaches to tackle the problem of reaching the goal under such conditions: planning and learning. Within planning, an agent constantly re-plans and updates the path based on the history of the observations using a search-based planner. Within learning, an agent asynchronously learns to optimize a policy function using recurrent neural networks (we propose an original efficient, scalable approach). We carry on an extensive empirical evaluation of both approaches that show that the learning-based approach scales better to the increasing number of the unpredictably appearing/disappearing obstacles. At the same time, the planning-based one is preferable when the environment is close-to-the-deterministic (i.e., external disturbances are rare). Code available at https://github.com/Tviskaron/pathfinding-in-stochastic-envs.

show abstract

Section: Related Workmentioning

confidence: 99%

Pathfinding in stochastic environments: learning vs planning

Skrynnik

Andreychuk²,

Yakovlev

et al. 2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Compared to playing against a single policy, Smith et al (2020) claimed that such a mechanism brings stochasticity on the opponent and forgets previous experiences, making algorithms slow to converge. Thus, the authors proposed a method that distills the opponent mixture as a single policy via Q-mixing (Davydov et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Efficient Policy Space Response Oracles

Zhou¹,

Chen²,

Ye³

et al. 2022

Preprint

View full text Add to dashboard Cite

Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games but suffers from two problems: (1) the computation inefficiency due to consistently evaluating current populations by simulations; and (2) the exploration inefficiency due to learning best responses against a fixed meta-strategy at each iteration. In this work, we propose Efficient PSRO (EPSRO) that largely improves the efficiency of the above two steps. Central to our development is the newly-introduced subroutine of minimax optimization on unrestricted-restricted (URR) games. By solving URR at each step, one can evaluate the current game and compute the best response in one forward pass with no need for game simulations. Theoretically, we prove that the solution procedures of EPSRO offer a monotonic improvement on exploitability. Moreover, a desirable property of EPSRO is that it is parallelizable, this allows for efficient exploration in the policy space that induces behavioral diversity. We test EPSRO on three classes of games, and report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.

show abstract

Q-Mixing Network for Multi-agent Pathfinding in Partially Observable Grid Environments

Cited by 2 publications

References 12 publications

Pathfinding in stochastic environments: learning vs planning

Pathfinding in stochastic environments: learning vs planning

Efficient Policy Space Response Oracles

Contact Info

Product

Resources

About