2021
DOI: 10.1109/access.2021.3111321
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Policy Learning for Multi-Agent Pathfinding

Abstract: In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 30 publications
0
7
0
Order By: Relevance
“…In addition, MAPPER integrates an evolutionary algorithm to enhance the refinement of agent policies. Another example of communicationfree approaches can be seen in studies by Skrynnik et al [69], [160], [161].…”
Section: ) Methodology Detailsmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, MAPPER integrates an evolutionary algorithm to enhance the refinement of agent policies. Another example of communicationfree approaches can be seen in studies by Skrynnik et al [69], [160], [161].…”
Section: ) Methodology Detailsmentioning
confidence: 99%
“…Open Question 9: How can effectively learned implicit communication minimize the need and overhead of explicit communication while achieving comparable outcomes? Studies such as [69], [161] demonstrate that state-of-theart decentralized behavior can be achieved by utilizing only local observations per agent without the need for explicit communication.…”
Section: ) Challenges and Open Questions A: Communicationmentioning
confidence: 99%
See 1 more Smart Citation
“…Target matrix : if the agent’s goal is inside the observation field, then there is 1 in the cell, where it is located, and 0 in other cells. If the target does not fall into the view, then it is projected onto the nearest cell of the observation field ( Skrynnik et al., 2021 ).…”
Section: Methodsmentioning
confidence: 99%
“…It is also worth noting a direction of research where planning algorithms are combined with reinforcement learning. In Skrynnik et al (2021) and Davydov et al (2021) , the authors train RL agents in a centralized (QMIX) and decentralized (PPO) way for solving multi-agent pathfinding tasks. The resulting RL policies are combined with a planning approach (MCTS), which leverages the resulting performance.…”
Section: Related Workmentioning
confidence: 99%