2023
DOI: 10.1609/aaai.v37i5.25758
|View full text |Cite
|
Sign up to set email alerts
|

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Rui Zhao,
Jinming Song,
Yufeng Yuan
et al.

Abstract: We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…However, these methods may still limit the agent's cooperation ability in familiar tasks and fail to handle unseen tasks or new agent interactions. Another line of research focuses on zero-shot coordination (ZSC), utilizing Population-Based Training (PBT;Strouse et al 2021;Zhao et al 2023;Lupu et al 2021;Lucas and Allen 2022;Li et al 2023bLi et al , 2024 and Theory of Mind (ToM; Hu et al 2021a;Wu et al 2021;Wang et al 2021) to facilitate adaptive policy development for coordinating with various counterparts without prior coordination experience. However, these ZSC methods demand significant computational resources for data collection and model optimization, and the resulting policies often lack interpretability.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, these methods may still limit the agent's cooperation ability in familiar tasks and fail to handle unseen tasks or new agent interactions. Another line of research focuses on zero-shot coordination (ZSC), utilizing Population-Based Training (PBT;Strouse et al 2021;Zhao et al 2023;Lupu et al 2021;Lucas and Allen 2022;Li et al 2023bLi et al , 2024 and Theory of Mind (ToM; Hu et al 2021a;Wu et al 2021;Wang et al 2021) to facilitate adaptive policy development for coordinating with various counterparts without prior coordination experience. However, these ZSC methods demand significant computational resources for data collection and model optimization, and the resulting policies often lack interpretability.…”
Section: Related Workmentioning
confidence: 99%
“…In previous works on Overcooked-AI, the cooperative performance of an agent is often evaluated with two held-out populations: self-play (SP) agent and human proxy model. We conduct a comparative analysis between our proposed ProAgent and five alternatives prevalent in the field including SP (Tesauro 1994;Carroll et al 2019), PBT (Jaderberg et al 2017), FCP (Strouse et al 2021), MEP (Zhao et al 2023), and COLE (Li et al 2023b(Li et al , 2024. We combined the above six algorithms in pairs to construct 36 pairs.…”
Section: Experiments Experimental Settingsmentioning
confidence: 99%
See 1 more Smart Citation