Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Zhao, Rui; Song, Jinming; Yuan, Yufeng; Hu, Haifeng; Gao, Yang; Wu, Yi; Sun, Zhongqian; Yang, Wei

doi:10.1609/aaai.v37i5.25758

Cited by 4 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these methods may still limit the agent's cooperation ability in familiar tasks and fail to handle unseen tasks or new agent interactions. Another line of research focuses on zero-shot coordination (ZSC), utilizing Population-Based Training (PBT;Strouse et al 2021;Zhao et al 2023;Lupu et al 2021;Lucas and Allen 2022;Li et al 2023bLi et al , 2024 and Theory of Mind (ToM; Hu et al 2021a;Wu et al 2021;Wang et al 2021) to facilitate adaptive policy development for coordinating with various counterparts without prior coordination experience. However, these ZSC methods demand significant computational resources for data collection and model optimization, and the resulting policies often lack interpretability.…”

Section: Related Workmentioning

confidence: 99%

“…In previous works on Overcooked-AI, the cooperative performance of an agent is often evaluated with two held-out populations: self-play (SP) agent and human proxy model. We conduct a comparative analysis between our proposed ProAgent and five alternatives prevalent in the field including SP (Tesauro 1994;Carroll et al 2019), PBT (Jaderberg et al 2017), FCP (Strouse et al 2021), MEP (Zhao et al 2023), and COLE (Li et al 2023b(Li et al , 2024. We combined the above six algorithms in pairs to construct 36 pairs.…”

Section: Experiments Experimental Settingsmentioning

confidence: 99%

“…This research delves into the capabilities of LLMs in tackling the intricate challenges of multi-agent coordination (Yang and Wang 2021;Zhang, Yang, and Bas ¸ar 2021;Gronauer and Diepold 2022), particularly in the realm of policy generalization (Strouse et al 2021;Zhao et al 2023;Li et al 2023bLi et al , 2024. Current approaches (Carroll et al 2019;Jaderberg et al 2017;Strouse et al 2021;Zhao et al 2023;Li et al 2023bLi et al , 2024 to developing cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents' capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Zhang,

Yang,

et al. 2024

AAAI

View full text Add to dashboard Cite

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current approaches to developing cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents' capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios. To address this challenge, we propose ProAgent, a novel framework that harnesses large language models (LLMs) to create proactive agents capable of dynamically adapting their behavior to enhance cooperation with teammates. ProAgent can analyze the present state, and infer the intentions of teammates from observations. It then updates its beliefs in alignment with the teammates' subsequent actual behaviors. Moreover, ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various of coordination scenarios. Experimental evaluations conducted within the Overcooked-AI environment unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training when cooperating with AI agents. Furthermore, in partnered with human proxy models, its performance exhibits an average improvement exceeding 10% compared to the current state-of-the-art method. For more information about our project, please visit https://pku-proagent.github.io.

show abstract

Section: Related Workmentioning

confidence: 99%