Model-Based Opponent Modeling

Yu, Xiaopeng; Jiang, Jiechuan; Jiang, Haobin; Lu, Zongqing

doi:10.48550/arxiv.2108.01843

Cited by 1 publication

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Grover et al use imitation learning and contrastive learning to predict the next actions of an opponent [17]. Yu et al utilize a model-based approach with neural networks for predicting the next state of the opponents [18]. We differ from prior approaches by 1) only utilizing sparse observations (i.e., we do not have access to the true states of opponents unless detected) and 2) we justify the value of the designed filter by helping train a MARL to complete S&T tasks which could not be done without the filter as shown in §V-B.…”

Section: B Learning-based Nonlinear Filtersmentioning

confidence: 99%

Consistent Epistemic Planning for Multiagent Deep Reinforcement Learning

Luo

Tian³

et al. 2023

Preprint

View full text Add to dashboard Cite

Multi-agent cooperation needs to reason about beliefs in the partially observable environment without communication, but the traditional Multi-agent Deep Reinforcement Learning (MADRL) algorithm struggles to handle the uncertainty of agents. Multi-agent Epistemic planning (MEP) tries to let the agent find a best plan to complete the cooperation task, so as to more effectively solve the uncertainty. However, inconsistent planning arises if the MADRL only adds MEP. We propose a MADRL-based policy network architecture called SMM-MEPP: Shared Mental Model - Multi-agent Epistemic Planning Policy. Firstly, Multi-agent Epistemic Planning and MADRL are investigated to build the "Perception-Planning-Action" multi-agent epistemic planning framework. Then, mental model in psychology is introduced and descript as a neural network. Thirdly, parameter sharing mechanism is utilized to achieve the shared mental model and maintain the consistency of epistemic planning. Finally, we apply the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG and MAPPO) and conduct comparative experiments in multi-agent cooperation tasks. Experiments show that the proposed method can bring consistent planning for multiple agents, and improves convergence speed or training effect in partially observable environment without communication.

show abstract

Section: B Learning-based Nonlinear Filtersmentioning

confidence: 99%