“…These ideas have been applied to the MARL setting [Guestrin et al, 2001a, 2002, Sunehag et al, 2017, Rashid et al, 2018, Zhang et al, 2018a,b, Zhang and Zavlanos, 2019 and have proven successful in experiments, but lack theoretical guarantees or non-asymptotic analysis. A recent line of work has formally considered spatial decay of correlation assumptions for nearest-neighbors dynamics and designed decentralized algorithms based on policy gradient and actor-critic methods [Qu and Li, 2019, Qu et al, 2020a, Lin et al, 2020, Qu et al, 2020b, establishing non-asymptotic convergence guarantees towards a stationary point, but not towards an optimal policy. 1 An application of the same principles to the setting of mean-field MARL [Yang et al, 2018] can be found in Haotian Gu [2021], where the authors show that a neural network based version of the actor-critic algorithm can achieve global convergence.…”