2021
DOI: 10.48550/arxiv.2110.14555
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

Abstract: A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(40 citation statements)
references
References 36 publications
(47 reference statements)
0
40
0
Order By: Relevance
“…Zero-sum Markov game has been widely studied since the seminal work [Shapley, 1953]. When the transition kernel is unknown, different sampling oracles are utilized to acquire samples, including online sampling , Xie et al, 2020a, Liu et al, 2021, Jin et al, 2021a, Song et al, 2021, generative model sampling [Sidford et al, 2020, Cui and Yang, 2020, Zhang et al, 2020, Jia et al, 2019. For offline sampling oracle, Zhang et al [2021b] provides finite sample bound for a decentralized algorithm with network communication under uniform concentration assumption and Abe and Kaneko [2020] considers offline policy evaluation, again under the uniform concentration assumption.…”
Section: Related Workmentioning
confidence: 99%
“…Zero-sum Markov game has been widely studied since the seminal work [Shapley, 1953]. When the transition kernel is unknown, different sampling oracles are utilized to acquire samples, including online sampling , Xie et al, 2020a, Liu et al, 2021, Jin et al, 2021a, Song et al, 2021, generative model sampling [Sidford et al, 2020, Cui and Yang, 2020, Zhang et al, 2020, Jia et al, 2019. For offline sampling oracle, Zhang et al [2021b] provides finite sample bound for a decentralized algorithm with network communication under uniform concentration assumption and Abe and Kaneko [2020] considers offline policy evaluation, again under the uniform concentration assumption.…”
Section: Related Workmentioning
confidence: 99%
“…(Wei et al, 2017;Xie et al, 2020;Liu et al, 2021;Chen et al, 2021;Jin et al, 2021b;Huang et al, 2021), as well as learning (Coarse) Correlated Equilibria in multi-player general-sum MGs, e.g. (Liu et al, 2021;Song et al, 2021;Jin et al, 2021a;Mao and Başar, 2022). As the settings of MGs in these work do not allow imperfect information, these results do not imply results for learning IIEFGs.…”
Section: Related Workmentioning
confidence: 87%
“…MARL. There is a long line of research on the theoretical aspects of MARL, mainly focusing on MGs (Littman, 1994;Xie et al, 2020;Zhang et al, 2020;Liu et al, 2021;Jin et al, 2021). This literature is only partially related for two reasons: it aims to converge to an equilibrium rather than minimize individual regret, and MGs assume the agents share the current state while in our model different agents traverse different trajectories (i.e., modelling our setting in a MG requires exponentially large state space).…”
Section: Related Workmentioning
confidence: 99%
“…Cooperative multi-agent reinforcement learning (MARL; see Zhang et al (2021a)) achieved impressive empirical success in many applications such as cyber-physical systems (Adler & Blue, 2002;Wang et al, 2016), finance (Lee et al, 2002;2007) and sensor/communication networks (Cortes et al, 2004;Choi et al, 2009). The theoretical work on MARL has focused on either Markov Games (MGs) (Jin et al, 2021), where the goal is to converge to an equilibrium, or stochastic MDPs (Lidard et al, 2021).…”
Section: Introductionmentioning
confidence: 99%