2022
DOI: 10.48550/arxiv.2202.01752
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Abstract: This paper resolves the open question of designing near-optimal algorithms for learning imperfectinformation extensive-form games from bandit feedback. We present the first line of algorithms that require only O((XA + Y B)/ε 2 ) episodes of play to find an ε-approximate Nash equilibrium in two-player zero-sum games, where X, Y are the number of information sets and A, B are the number of actions for the two players. This improves upon the best known sample complexity of O((X 2 A + Y 2 B)/ε 2 ) by a factor of O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
33
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(34 citation statements)
references
References 19 publications
0
33
0
Order By: Relevance
“…In particular, we provide a way to combine any gradient estimator (unbiased or biased), any exploration strategy, any interactive strategy, with any full-feedback regret minimizer to assemble a bandit regret minimization method. • We demonstrate that the most recent bandit regret minimization methods, i.e., MCCFR [Lanctot et al, 2009, Farina et al, 2020b, Farina and Sandholm, 2021, IXOMD [Kozuno et al, 2021] and balanced OMD/CFR [Bai et al, 2022], can be analyzed as a special case of our framework. We first present the theoretical bounds for biased gradient estimation bandit regret minimization methods in IIEGs.…”
Section: Introductionmentioning
confidence: 89%
See 4 more Smart Citations
“…In particular, we provide a way to combine any gradient estimator (unbiased or biased), any exploration strategy, any interactive strategy, with any full-feedback regret minimizer to assemble a bandit regret minimization method. • We demonstrate that the most recent bandit regret minimization methods, i.e., MCCFR [Lanctot et al, 2009, Farina et al, 2020b, Farina and Sandholm, 2021, IXOMD [Kozuno et al, 2021] and balanced OMD/CFR [Bai et al, 2022], can be analyzed as a special case of our framework. We first present the theoretical bounds for biased gradient estimation bandit regret minimization methods in IIEGs.…”
Section: Introductionmentioning
confidence: 89%
“…So in this setting, before using the full-feedback regret minimizer, it is necessary to estimate the the loss gradient t by v(z t ). There are two ways-the unbiased estimator [Lanctot et al, 2009, Zhou et al, 2019, Farina et al, 2020b, 2021b, Farina and Sandholm, 2021 and the biased estimator [Kozuno et al, 2021, Bai et al, 2022. The former enables the expectation of the output estimated gradient…”
Section: Equilibrium Finding With Regret Minimizationmentioning
confidence: 99%
See 3 more Smart Citations