“…Recently, centralized training for decentralized execution (CTDE) [29,30] paradigm attracts the most attention in MARL community, as it handles offline training by utilizing global information while allowing online execution in a decentralized way based on only local information. Current MARL methods mainly implement CTDE in two patterns: one is based on the actor-critic framework that learns a centralized critic to calculate each actor's policy gradients [9,10,12,17,18,31,32,33,34]; the other relies on factorizing centralized Q-value to each decentralized Q-value via networks with variant types of constraints [11,14,15,19,35,36]. Despite these methods having generated remarkable 1.2.…”