2020
DOI: 10.48550/arxiv.2006.10800
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Abstract: QMIX is a popular Q-learning algorithm for cooperative MARL in the centralised training and decentralised execution paradigm. In order to enable easy decentralisation, QMIX restricts the joint action Q-values it can represent to be a monotonic mixing of each agent's utilities. However, this restriction prevents it from representing value functions in which an agent's ordering over its actions can depend on other agents' actions. To analyse this representational limitation, we first formalise the objective QMIX… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(29 citation statements)
references
References 10 publications
0
29
0
Order By: Relevance
“…Finally, we design ablation studies to investigate the improvement of GVR. Our method is compared with state-of-the-art baselines including QMIX (Rashid et al, 2018), QPLEX , and WQMIX (Rashid et al, 2020). All results are evaluated over 5 seeds.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Finally, we design ablation studies to investigate the improvement of GVR. Our method is compared with state-of-the-art baselines including QMIX (Rashid et al, 2018), QPLEX , and WQMIX (Rashid et al, 2020). All results are evaluated over 5 seeds.…”
Section: Methodsmentioning
confidence: 99%
“…The other works improve the coordination from different perspectives. WQMIX (Rashid et al, 2020) tries to solve the underestimation of the optimal joint values that arise from the representation limitation, where an auxiliary network with complete expressiveness capacity is applied to distinguishes samples with low expressive values. By placing a predefined weight on these samples, WQMIX can alleviate the underestimation of optimal joint Q values.…”
Section: B2 Value Decomposition Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Reward decomposition is a promising way to exploit the CTDE paradigm Rashid et al (2018);Foerster et al (2018); Mahajan et al (2019); Wang et al (2020b); Rashid et al (2020), which is a composite of a local utility network for each agent's execution and a mixing network for combining local utilities into a global action value. Many existing methods try to learn a compelling local utility network, which typically needs a relatively extensive network and more execution time.…”
Section: Introductionmentioning
confidence: 99%