2020
DOI: 10.48550/arxiv.2008.01062
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

QPLEX: Duplex Dueling Multi-Agent Q-Learning

Abstract: We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE requires the consistency of the optimal joint action selection with optimal individual action selections, which is called the IGM (Individual-Global-Max) principle. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may lead to poor pol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
67
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 34 publications
(68 citation statements)
references
References 14 publications
(20 reference statements)
1
67
0
Order By: Relevance
“…( 4) Why do we choose to predict target Q ext i for generating intrinsic rewards rather than other choices (Section 5.4)? We will propose several didactic examples and demonstrate the advantage of our method in coordinated exploration, and evaluate our method on the StarCraft II micromanagement (SMAC) benchmark [8] compared with existing state-of-the-art multi-agent reinforcement learning (MARL) algorithms: QPLEX [7], Weighted-QMIX [39], QTRAN [6], QMIX [5], VDN [4], RODE [40], and MAVEN [15].…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…( 4) Why do we choose to predict target Q ext i for generating intrinsic rewards rather than other choices (Section 5.4)? We will propose several didactic examples and demonstrate the advantage of our method in coordinated exploration, and evaluate our method on the StarCraft II micromanagement (SMAC) benchmark [8] compared with existing state-of-the-art multi-agent reinforcement learning (MARL) algorithms: QPLEX [7], Weighted-QMIX [39], QTRAN [6], QMIX [5], VDN [4], RODE [40], and MAVEN [15].…”
Section: Methodsmentioning
confidence: 99%
“…The independence of inference module leads to another advantage, that EMC's architecture can be adopted into many value-factorization-based multi-agent algorithms which utilize the CDTE paradigm, i.e., the general function f in Figure 2a can indicate specific (linear, monotonic and IGM) value factorization structures in VDN [4], QMIX [5], and QPLEX [7], respectively. In this paper, we utilize these state-of-the-art algorithms for the inference module.…”
Section: Episodic Memorymentioning
confidence: 99%
See 2 more Smart Citations
“…This method can apply to competitive settings but does not address the multiagent credit assignment problem. Another line of work using centralised training, decentralised execution, and Q-learning are the implicit credit assignment methods such as [25] [24] [36]. More recently, [43] showed the effectiveness of PPO in multiagent problems for discrete action space domains.…”
Section: Related Workmentioning
confidence: 99%