2022
DOI: 10.48550/arxiv.2207.14211
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Abstract: An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…To the best of our knowledge, this is the first finite-sample analysis of best-response type independent learning dynamics that are convergent and rational for Markov games. Most existing MARL algorithms are either symmetric across players but not payoffbased, e.g., Cen et al (2021Cen et al ( , 2022; Zhang et al (2022a); Zeng et al (2022); Erez et al (2022), or not symmetric and thus not rational, e.g., Daskalakis et al (2020); Zhao et al (2021); Zhang et al (2021b); Alacaoglu et al (2022), or do not have finite-sample guarantees, e.g., Leslie et al (2020); Sayin et al (2021); Baudin and Laraki (2022b).…”
Section: Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…To the best of our knowledge, this is the first finite-sample analysis of best-response type independent learning dynamics that are convergent and rational for Markov games. Most existing MARL algorithms are either symmetric across players but not payoffbased, e.g., Cen et al (2021Cen et al ( , 2022; Zhang et al (2022a); Zeng et al (2022); Erez et al (2022), or not symmetric and thus not rational, e.g., Daskalakis et al (2020); Zhao et al (2021); Zhang et al (2021b); Alacaoglu et al (2022), or do not have finite-sample guarantees, e.g., Leslie et al (2020); Sayin et al (2021); Baudin and Laraki (2022b).…”
Section: Contributionsmentioning
confidence: 99%
“…The zero-sum case is more challenging since there is no off-theshelf Lyapunov function, which the potential function in the potential game case serves as. For non-potential game settings, symmetric variants of policy gradient methods have been proposed, but have only been studied under the full-information setting without finite-sample guarantees (Cen et al, 2021(Cen et al, , 2022Pattathil et al, 2022;Zhang et al, 2022a;Zeng et al, 2022;Erez et al, 2022), with the exception of Wei et al (2021); Chen et al (2021a). However, the learning algorithm in Wei et al (2021) requires some coordination between the players when sampling, and is thus not completely independent; that in Chen et al (2021a) is extragradient-based and not best-response-type, and needs some stage-based sampling process that also requires coordination across players.…”
Section: Sample-efficient Marlmentioning
confidence: 99%