2021
DOI: 10.48550/arxiv.2107.14702
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards General Function Approximation in Zero-Sum Markov Games

Abstract: This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves. The study focuses on the challenging settings where the value function or the model is parameterized by general function classes. Provably efficient algorithms for both decoupled and coordinated settings are developed. In the decoupled setting where the agent controls a single player and plays against an arbitrary opponent, we propose a new model-free algorithm. The sample complexity is governed by the Minimax Eluder … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 15 publications
(16 reference statements)
0
13
0
Order By: Relevance
“…Algorithms with asymptotic convergence have been proposed in the early works of (Hu and Wellman, 2003;Littman, 2001;Hansen et al, 2013). A recent line of work studies the non-asymptotic sample complexity for learning Nash in two-player zero-sum Markov games Xie et al, 2020;Zhang et al, 2020;Chen et al, 2021;Huang et al, 2021) and learning various equilibria in general-sum Markov games , building on techniques for learning single-agent Markov Decision Processes sample-efficiently (Azar et al, 2017;Jin et al, 2018). Learning the Nash equilibrium in general-sum Markov games are much harder than that in zero-sum Markov games.…”
Section: Related Workmentioning
confidence: 99%
“…Algorithms with asymptotic convergence have been proposed in the early works of (Hu and Wellman, 2003;Littman, 2001;Hansen et al, 2013). A recent line of work studies the non-asymptotic sample complexity for learning Nash in two-player zero-sum Markov games Xie et al, 2020;Zhang et al, 2020;Chen et al, 2021;Huang et al, 2021) and learning various equilibria in general-sum Markov games , building on techniques for learning single-agent Markov Decision Processes sample-efficiently (Azar et al, 2017;Jin et al, 2018). Learning the Nash equilibrium in general-sum Markov games are much harder than that in zero-sum Markov games.…”
Section: Related Workmentioning
confidence: 99%
“…Early works consider casting this min-max problem over the sequence-form policies as a linear program (Koller and Megiddo, 1992;Von Stengel, 1996;Koller et al, 1996). First-order algorithms are later proposed for solving the min-max problem directly, in particular by using proper regularizers such as the dilated KL distance (Gilpin et al, 2008;Hoda et al, 2010;Kroer et al, 2015;Lee et al, 2021). Another prevalent approach is Counterfactual Regret Minimization (CFR) (Zinkevich et al, 2007), which works by minimizing (local) counterfactual regrets at each infoset separately using any regret minimization algorithm over the probability simplex such as Regret Matching or Hedge (Tammelin, 2014;Zhou et al, 2020;Farina et al, 2020b).…”
Section: Related Workmentioning
confidence: 99%
“…(Sidford et al, 2020;Zhang et al, 2020;Daskalakis et al, 2020;Wei et al, 2021) or in the exploration setting, e.g. (Wei et al, 2017;Xie et al, 2020;Liu et al, 2021;Chen et al, 2021;Jin et al, 2021b;Huang et al, 2021), as well as learning (Coarse) Correlated Equilibria in multi-player general-sum MGs, e.g. (Liu et al, 2021;Song et al, 2021;Jin et al, 2021a;Mao and Başar, 2022).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Contemporaneously, Jin et al [2021b], Huang et al [2021] studied multi-agent RL with function approximation in finite-horizon episodic zero-sum stochastic games, with also the optimism principle and regret guarantees.…”
Section: Multi-agent Reinforcement Learningmentioning
confidence: 99%