DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning

Yang, Zhaoxing; Jin, Haiming; Ding, Rong; You, Haoyi; Fan, Guiyun; Wang, Xinbing; Zhou, Chenghu

doi:10.1609/aaai.v37i9.26288

Cited by 1 publication

(6 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, centralized training for decentralized execution (CTDE) [29,30] paradigm attracts the most attention in MARL community, as it handles offline training by utilizing global information while allowing online execution in a decentralized way based on only local information. Current MARL methods mainly implement CTDE in two patterns: one is based on the actor-critic framework that learns a centralized critic to calculate each actor's policy gradients [9,10,12,17,18,31,32,33,34]; the other relies on factorizing centralized Q-value to each decentralized Q-value via networks with variant types of constraints [11,14,15,19,35,36]. Despite these methods having generated remarkable 1.2.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Independent Actor with Centralized Critic (IACC) is another widespread exploitation of the CTDE paradigm. The state-of-the-art MARL policy gradient approaches [9,10,12,17,18,31,32,33,34] utilize IACC in variant ways and have achieved significant successes 22 CHAPTER 2. BACKGROUND in solving many challenging multi-agent tasks.…”

Section: Decentralized Learning and Controlmentioning

confidence: 99%

“…Apart from these positive effects of using a centralized critic, it certainly introduces extra variance in each agent's decentralized policy gradient estimation depending on other agents' actions [72,18]. Therefore, we consider the version of IACC with a joint history-value function as the critic to reduce the variance, and the decentralized policy gradient can be rewritten as:…”

Section: Decentralized Learning and Controlmentioning

confidence: 99%

“…Recently, by leveraging deep neural networks to deal with large state and observation input, deep MARL has attracted great attention and achieved many successes in solving challenging multi-agent problems. Unfortunately, the state-of-the-art deep MARL methods [9,10,11,12,13,14,15,16,17,18,19] are not truly applicable to solve large and long-horizon multi-agent tasks in the real-world, because they are originally developed for cases where agents synchronously execute primitive-actions at every time step.…”

Section: Introduction 11 Overviewmentioning

confidence: 99%

“…In recent years, multi-agent policy gradient methods using the actor-critic framework have achieved impressive success in solving a variety of cooperative and competitive domains [9,10,12,17,18,31,32,33,34,84,85,86,87]. However, as these methods assume synchronized primitive-action execution over agents, they struggle to solve large-scale real-world multi-agent problems that involve long-term reasoning and asynchronous behavior.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations