Towards General Function Approximation in Zero-Sum Markov Games

Huang, Baihe; Lee, Jason D.; Wang, Zhaoran; Yang, Zhuoran

doi:10.48550/arxiv.2107.14702

Cited by 8 publications

(13 citation statements)

References 15 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Algorithms with asymptotic convergence have been proposed in the early works of (Hu and Wellman, 2003;Littman, 2001;Hansen et al, 2013). A recent line of work studies the non-asymptotic sample complexity for learning Nash in two-player zero-sum Markov games Xie et al, 2020;Zhang et al, 2020;Chen et al, 2021;Huang et al, 2021) and learning various equilibria in general-sum Markov games , building on techniques for learning single-agent Markov Decision Processes sample-efficiently (Azar et al, 2017;Jin et al, 2018). Learning the Nash equilibrium in general-sum Markov games are much harder than that in zero-sum Markov games.…”

Section: Related Workmentioning

confidence: 99%

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Song¹,

Song²,

Bai³

2021

Preprint

View full text Add to dashboard Cite

Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of m-player general-sum Markov games with H steps, S states, and A i actions per player. First, we design algorithms for learning an ε-Coarse Correlated Equilibrium (CCE) in O(H 5 S max i≤m A i /ε 2 ) episodes, and an ε-Correlated Equilibrium (CE) in O(H 6 S max i≤m A 2 i /ε 2 ) episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in max i≤m A i . Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an ε-approximate Nash equilibrium within O(S i≤m A i /ε 3 ) episodes (when only highlighting the dependence on S, A i , and ε), which only depends linearly in i≤m A i and significantly improves over existing efficient algorithms in the ε dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.

show abstract

Section: Related Workmentioning

confidence: 99%

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Song¹,

Song²,

Bai³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Early works consider casting this min-max problem over the sequence-form policies as a linear program (Koller and Megiddo, 1992;Von Stengel, 1996;Koller et al, 1996). First-order algorithms are later proposed for solving the min-max problem directly, in particular by using proper regularizers such as the dilated KL distance (Gilpin et al, 2008;Hoda et al, 2010;Kroer et al, 2015;Lee et al, 2021). Another prevalent approach is Counterfactual Regret Minimization (CFR) (Zinkevich et al, 2007), which works by minimizing (local) counterfactual regrets at each infoset separately using any regret minimization algorithm over the probability simplex such as Regret Matching or Hedge (Tammelin, 2014;Zhou et al, 2020;Farina et al, 2020b).…”

Section: Related Workmentioning

confidence: 99%

“…(Sidford et al, 2020;Zhang et al, 2020;Daskalakis et al, 2020;Wei et al, 2021) or in the exploration setting, e.g. (Wei et al, 2017;Xie et al, 2020;Liu et al, 2021;Chen et al, 2021;Jin et al, 2021b;Huang et al, 2021), as well as learning (Coarse) Correlated Equilibria in multi-player general-sum MGs, e.g. (Liu et al, 2021;Song et al, 2021;Jin et al, 2021a;Mao and Başar, 2022).…”

Section: Related Workmentioning

confidence: 99%

“…A central question in IIEFGs is the problem of finding a Nash equilibrium (NE) (Nash, 1950) in a two-player zero-sum IIEFG with perfect recall. There is an extensive line of work for solving this problem with full knowledge of the game (or full feedback), by either reformulating as a linear program (Koller and Megiddo, 1992;Von Stengel, 1996;Koller et al, 1996), first-order optimization methods (Hoda et al, 2010;Kroer et al, 2015Kroer et al, , 2018Munos et al, 2020;Lee et al, 2021), or Counterfactual Regret Minimization (Zinkevich et al, 2007;Lanctot et al, 2009;Johanson et al, 2012;Tammelin, 2014;.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Jin¹,

Song²,

Yu³

2022

Preprint

View full text Add to dashboard Cite

This paper resolves the open question of designing near-optimal algorithms for learning imperfectinformation extensive-form games from bandit feedback. We present the first line of algorithms that require only O((XA + Y B)/ε 2 ) episodes of play to find an ε-approximate Nash equilibrium in two-player zero-sum games, where X, Y are the number of information sets and A, B are the number of actions for the two players. This improves upon the best known sample complexity of O((X 2 A + Y 2 B)/ε 2 ) by a factor of O(max{X, Y }), and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating balanced exploration policies into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.

show abstract

“…Contemporaneously, Jin et al [2021b], Huang et al [2021] studied multi-agent RL with function approximation in finite-horizon episodic zero-sum stochastic games, with also the optimism principle and regret guarantees.…”

Section: Multi-agent Reinforcement Learningmentioning

confidence: 99%

Independent Learning in Stochastic Games

Ozdaglar¹,

Sayin²,

Zhang³

2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics.Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is stationary and does not take into account the adaptivity of other agents. In this review paper, we present the model of stochastic games [Shapley, 1953] for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent. There has been limited progress on developing convergent best-response type independent learning dynamics for stochastic games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Along the way, we also reexamine some classical results from both the game theory and RL literature, to situate both the conceptual contributions of our independent learning dynamics, and the mathematical novelties of our analysis. We hope this review paper serves as an impetus for the resurgence of studying independent and natural learning dynamics in game theory, for the more challenging settings with a dynamic environment.

show abstract

Towards General Function Approximation in Zero-Sum Markov Games

Cited by 8 publications

References 15 publications

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Independent Learning in Stochastic Games

Contact Info

Product

Resources

About