2023
DOI: 10.1609/aaai.v37i9.26288
|View full text |Cite
|
Sign up to set email alerts
|

DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning

Abstract: In recent years, multi-agent reinforcement learning (MARL) has presented impressive performance in various applications. However, physical limitations, budget restrictions, and many other factors usually impose constraints on a multi-agent system (MAS), which cannot be handled by traditional MARL frameworks. Specifically, this paper focuses on constrained MASes where agents work cooperatively to maximize the expected team-average return under various constraints on expected team-average costs, and develops a c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…Recently, centralized training for decentralized execution (CTDE) [29,30] paradigm attracts the most attention in MARL community, as it handles offline training by utilizing global information while allowing online execution in a decentralized way based on only local information. Current MARL methods mainly implement CTDE in two patterns: one is based on the actor-critic framework that learns a centralized critic to calculate each actor's policy gradients [9,10,12,17,18,31,32,33,34]; the other relies on factorizing centralized Q-value to each decentralized Q-value via networks with variant types of constraints [11,14,15,19,35,36]. Despite these methods having generated remarkable 1.2.…”
Section: Literature Reviewmentioning
confidence: 99%
See 4 more Smart Citations
“…Recently, centralized training for decentralized execution (CTDE) [29,30] paradigm attracts the most attention in MARL community, as it handles offline training by utilizing global information while allowing online execution in a decentralized way based on only local information. Current MARL methods mainly implement CTDE in two patterns: one is based on the actor-critic framework that learns a centralized critic to calculate each actor's policy gradients [9,10,12,17,18,31,32,33,34]; the other relies on factorizing centralized Q-value to each decentralized Q-value via networks with variant types of constraints [11,14,15,19,35,36]. Despite these methods having generated remarkable 1.2.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Independent Actor with Centralized Critic (IACC) is another widespread exploitation of the CTDE paradigm. The state-of-the-art MARL policy gradient approaches [9,10,12,17,18,31,32,33,34] utilize IACC in variant ways and have achieved significant successes 22 CHAPTER 2. BACKGROUND in solving many challenging multi-agent tasks.…”
Section: Decentralized Learning and Controlmentioning
confidence: 99%
See 3 more Smart Citations