“…For general Markov games, however, it is known that blindly applying independent/decentralized Q-learning can easily diverge, due to the non-stationarity of the environment [Tan, 1993, Boutilier, 1996, Matignon et al, 2012. Despite this, the decentralized paradigm has still attracted continuing research interest [Arslan and Yuksel, 2017, Pérolat et al, 2018, Daskalakis et al, 2020, Tian et al, 2020, Wei et al, 2021, since it is much more scalable and natural for agents to implement. Notably, these works are not as decentralized and as general as our algorithm.…”