Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Ding, Dong-Sheng; Wei, Chen-Yu; Zhang, Kaiqing; Jovanović, Mihailo R.

doi:10.48550/arxiv.2202.04129

Cited by 3 publications

(3 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Later, Sayin et al [2021] developed decentralized Q-learning dynamic that is symmetric, but with only asymptotic convergence guarantees in the zero-sum setting. More recently, for Markov potential games, which also includes identical-interest MGs as an example, such independent policy gradient algorithms are also shown to converge [Leonardos et al, 2021, Zhang et al, 2021, Fox et al, 2021, Ding et al, 2022. For episodic MGs, [Jin et al, 2021, Song et al, 2021, Mao and Bas ¸ar, 2022 establish the regret guarantees of decentralized learning algorithms in the online exploration setting.…”

Section: Independent Learning In Mgsmentioning

confidence: 98%

Fictitious Play in Markov Games with Single Controller

Sayin¹,

Zhang²,

Ozdaglar³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have the fictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always converge to a Nash equilibrium (NE) in the repeated play of these games. Such convergence results are seen as a (behavioral) justification for the game-theoretical equilibrium analysis. Markov games (MGs), also known as stochastic games, generalize the repeated play of strategicform games to dynamic multi-state settings with Markovian state transitions. In particular, MGs are standard models for multi-agent reinforcement learning -a reviving research area in learning and games, and their game-theoretical equilibrium analyses have also been conducted extensively. However, whether certain classes of MGs have the FPP or not (i.e., whether there is a behavioral justification for equilibrium analysis or not) remains largely elusive. In this paper, we study a new variant of fictitious play dynamics for MGs and show its convergence to an NE in n-player identical-interest MGs in which a single player controls the state transitions. Such games are of interest in communications, control, and economics applications. Our result together with the recent results in [Sayin et al., 2020] establishes the FPP of two-player zero-sum MGs and n-player identical-interest MGs with a single controller (standing at two different ends of the MG spectrum from fully competitive to fully cooperative).

show abstract

Section: Independent Learning In Mgsmentioning

confidence: 98%

Fictitious Play in Markov Games with Single Controller

Sayin¹,

Zhang²,

Ozdaglar³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…1 In contrast, this paper considers the full-information setting and proposes new algorithms achieving the faster O(T −3/4 ) for learning CCE in general-sum MGs. Another recent line of work considers learning NE in Markov Potential Games [58,27,47,15,59], which can be seen as a cooperative-type subclass of general-sum MGs.…”

Section: Introductionmentioning

confidence: 99%

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Zhang¹,

Liu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an O(T −5/6 ) approximate NE in T iterations, which improves over the current best O(T −1/2 ) rate of symmetric policy optimization type algorithms. We also extend this algorithm to multi-player general-sum Markov Games and show an O(T −3/4 ) convergence rate to Coarse Correlated Equilibria (CCE). Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using "eager" value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with H = 2 layers.

show abstract

“…Nevertheless, these efforts must grapple with a series of strong lower bounds for computing even weaker solution concepts like coarse correlated equilibria in turn-based stochastic games [12,27]. On that account, a recent line of work has focused on establishing convergence in specific subclasses of stochastic games, such as min-max [7,11,32,47,48,58] and common interest potential games [13,31,61]. However, despite these encouraging results, the general case remains particularly elusive.…”

Section: Introductionmentioning

confidence: 99%

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Angeliki¹,

Lotidis²,

Mertikopoulos³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms -like policy gradient and its variants -are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of sufficiency conditions used in optimization. Our first result is that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the Reinforce algorithm achieve an O(1/ √ n) distance-squared convergence rate if the method's step-size is chosen appropriately. Subsequently, specializing to the class of deterministic Nash policies, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case.

show abstract

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Cited by 3 publications

References 45 publications

Fictitious Play in Markov Games with Single Controller

Fictitious Play in Markov Games with Single Controller

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Contact Info

Product

Resources

About