“…This versatile approach 1.2 Related Work (Stationary) Multi-agent reinforcement learning. Numerous works have been devoted to learning equilibria in (stationary) multi-agent systems, including zero-sum Markov games [Bai et al, 2020, general-sum Markov games , Mao et al, 2022, Song et al, 2021, Daskalakis et al, 2022, Wang et al, 2023, Cui et al, 2023, Markov potential games [Leonardos et al, 2021, Song et al, 2021, Ding et al, 2022, Cui et al, 2023, congestion games [Cui et al, 2022], extensive-form games [Kozuno et al, 2021, and partially observable Markov games [Liu et al, 2022]. These works aim to learn equilibria with bandit feedback efficiently, measured by either regret or sample complexity.…”