2022
DOI: 10.48550/arxiv.2202.04129
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Abstract: We examine global non-asymptotic convergence properties of policy gradient methods for multiagent reinforcement learning (RL) problems in Markov potential games (MPG). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an -Nash equilibrium with O(1/ 2 ) iteration … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…Later, Sayin et al [2021] developed decentralized Q-learning dynamic that is symmetric, but with only asymptotic convergence guarantees in the zero-sum setting. More recently, for Markov potential games, which also includes identical-interest MGs as an example, such independent policy gradient algorithms are also shown to converge [Leonardos et al, 2021, Zhang et al, 2021, Fox et al, 2021, Ding et al, 2022. For episodic MGs, [Jin et al, 2021, Song et al, 2021, Mao and Bas ¸ar, 2022 establish the regret guarantees of decentralized learning algorithms in the online exploration setting.…”
Section: Independent Learning In Mgsmentioning
confidence: 98%
“…Later, Sayin et al [2021] developed decentralized Q-learning dynamic that is symmetric, but with only asymptotic convergence guarantees in the zero-sum setting. More recently, for Markov potential games, which also includes identical-interest MGs as an example, such independent policy gradient algorithms are also shown to converge [Leonardos et al, 2021, Zhang et al, 2021, Fox et al, 2021, Ding et al, 2022. For episodic MGs, [Jin et al, 2021, Song et al, 2021, Mao and Bas ¸ar, 2022 establish the regret guarantees of decentralized learning algorithms in the online exploration setting.…”
Section: Independent Learning In Mgsmentioning
confidence: 98%
“…1 In contrast, this paper considers the full-information setting and proposes new algorithms achieving the faster O(T −3/4 ) for learning CCE in general-sum MGs. Another recent line of work considers learning NE in Markov Potential Games [58,27,47,15,59], which can be seen as a cooperative-type subclass of general-sum MGs.…”
Section: Introductionmentioning
confidence: 99%
“…Nevertheless, these efforts must grapple with a series of strong lower bounds for computing even weaker solution concepts like coarse correlated equilibria in turn-based stochastic games [12,27]. On that account, a recent line of work has focused on establishing convergence in specific subclasses of stochastic games, such as min-max [7,11,32,47,48,58] and common interest potential games [13,31,61]. However, despite these encouraging results, the general case remains particularly elusive.…”
Section: Introductionmentioning
confidence: 99%