2021
DOI: 10.1007/s10458-021-09506-w
|View full text |Cite
|
Sign up to set email alerts
|

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning

Abstract: Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
27
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(31 citation statements)
references
References 18 publications
(26 reference statements)
2
27
0
Order By: Relevance
“…If the employed value function does not have the representational capacity to distinguish the values of coordinated and uncoordinated actions, an optimal policy cannot be learned. However, Castellini et al (2019) show that higher-order factorization of the value function works surprisingly well in one-shot games that are vulnerable to relative overgeneralization, even if each factor depends on the actions of only a small subset of agents. Such a higher-order factorization can be expressed as an undirected coordination graph (CG, Guestrin et al, 2002a), where each vertex represents one agent and each (hyper-)edge one payoff function over the joint action space of the connected agents.…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation

Deep Coordination Graphs

Böhmer,
Kurin,
Whiteson
2019
Preprint
Self Cite
“…If the employed value function does not have the representational capacity to distinguish the values of coordinated and uncoordinated actions, an optimal policy cannot be learned. However, Castellini et al (2019) show that higher-order factorization of the value function works surprisingly well in one-shot games that are vulnerable to relative overgeneralization, even if each factor depends on the actions of only a small subset of agents. Such a higher-order factorization can be expressed as an undirected coordination graph (CG, Guestrin et al, 2002a), where each vertex represents one agent and each (hyper-)edge one payoff function over the joint action space of the connected agents.…”
Section: Introductionmentioning
confidence: 98%
“…Sparse cooperative Q-learning (Kok & Vlassis, 2006) applies CGs to MARL but does not scale to modern benchmarks, as each payoff function (f 12 and f 23 in Figure 1b) is represented as a table over the state and joint action space of the connected agents. Castellini et al (2019) use neural networks to approximate payoff functions, but only in one-shot games, and still require a unique function for each edge in the CG. Consequently, each Sunehag et al, 2018) corresponds to an unconnected CG.…”
Section: Introductionmentioning
confidence: 99%

Deep Coordination Graphs

Böhmer,
Kurin,
Whiteson
2019
Preprint
Self Cite
“…Importantly, we show that carefully learned sparse graphs can significantly outperform complete graphs. We expect these observations to be a good supplementation to Castellini et al [23] and can eliminate any possible misunderstanding about sparse coordination graphs. Stability and Recommendation Table 3 in Appendix A shows the stability of each implementation on all tasks across the MACO benchmark.…”
Section: Which Methods Is Better For Learning Dynamically Sparse Coor...mentioning
confidence: 60%
“…Again, we observe a performance gap between complete coordination graphs and most implementations of context-aware sparse graphs. Castellini et al [23] finds that (randomly) sparse coordination graphs perform much worse than full graphs. This is aligned with our experimental results.…”
Section: Which Methods Is Better For Learning Dynamically Sparse Coor...mentioning
confidence: 99%
See 1 more Smart Citation