2021
DOI: 10.48550/arxiv.2111.05992
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning

Abstract: The creation and destruction of agents in cooperative multiagent reinforcement learning (MARL) is a critically underexplored area of research. Current MARL algorithms often assume that the number of agents within a group remains fixed throughout an experiment. However, in many practical problems, an agent may terminate before their teammates. This early termination issue presents a challenge: the terminated agent must learn from the group's success or failure which occurs beyond its own existence. We refer to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…However, since all elements that refine or de-refine are removed immediately and their individual trajectories terminate, they do not observe future states and cannot use future rewards to assign credit to their refine/de-refine actions. This is the posthumous multi-agent credit assignment problem [8]. We propose the use of centralized training to address this problem.…”
Section: Agent Creation and Deletionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, since all elements that refine or de-refine are removed immediately and their individual trajectories terminate, they do not observe future states and cannot use future rewards to assign credit to their refine/de-refine actions. This is the posthumous multi-agent credit assignment problem [8]. We propose the use of centralized training to address this problem.…”
Section: Agent Creation and Deletionmentioning
confidence: 99%
“…In this work, we present the first formulation of AMR as a Markov game [25,18] and propose a novel fully-cooperative deep multi-agent reinforcement learning (MARL) algorithm [9,16,40] called Value Decomposition Graph Network (VDGN) to train a team of independently and simultaneously acting agents, each of which is a decision-maker for an element, to optimize a global performance metric and find anticipatory refinement policies. Because refinement and de-refinement actions at each step of the AMR Markov game leads to the creation and deletion of agents, we face the posthumous credit assignment problem [8]: agents who contributed to a future reward are not necessarily present at the future time to experience it. We show that VDGN, by virtue of centralized training with decentralized execution [24], addresses this key challenge.…”
Section: Introductionmentioning
confidence: 99%
“…The episodic interaction of the two classes can be highlighted in Figure 5b. The Multi-Agent Posthumous Credit Assingment (MA-POCA) trainer was introduced by Unity Technologies (Cohen et al 2021). MA-POCA utilises the Independent Actor with Centralized Critic (IACC) framework, where a critic that is trained on joint information is utilized for updating independent agents or actors.…”
Section: Episodic Training Of Masmentioning
confidence: 99%
“…Because the CTDE paradigm learns from multiple agents sharing their policies, it can learn the optimal policy faster and more effectively than a single agent. Multi-Agent Posthmous Credit Assignment (MA-POCA) proposed by Cohen et al [7] performs Fig. 2: Overview of MRC model: Following the CTDE method, critic collects the environment information observed by each user at each time a reset occurs, and determines the actions accordingly.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Table 1: MRC model hyperparameters: γ and λ are discount factor and Temporal-Difference (TD) parameter in reinforcement learning [7].…”
Section: Actionmentioning
confidence: 99%