Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Zhang, Runyu; Liu, Qinghua; Wang, Huan; Xiong, Caiming; Li, Na

doi:10.48550/arxiv.2206.02640

Cited by 1 publication

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A classical averaging stepsize from Jin et al (2018) is utilized by the critic so that the errors accumulate slowly and last-iterate convergence is obtained. Zhang et al (2022) propose a modified OFTRL method, where the min-player and the max-players employ a lower and upper bound for value functions separately. The lower and upper bounds are computed from approximate Q-functions in past iterations.…”

Section: Related Workmentioning

confidence: 99%

“…Global convergence. The proof for global convergence of Averaging OGDA adapts several standard techniques from Markov games Zhang et al (2022); Wei et al (2021). We attach its proof in Appendix C.1 for completeness.…”

Section: Global Convergence and Geometric Boundedness Of Averaging Ogdamentioning

confidence: 99%

“…However, due to the nonconvexity-nonconcavity, theoretical understanding of zero-sum Markov games is sparser. Existing methods have either sublinear rates for finding Nash equilibria, or linear rates for finding regularized Nash equiliria such as quantal response equilibria which are approximations for Nash equilibria Alacaoglu et al (2022); Cen et al (2021); Daskalakis et al (2020); Pattathil et al (2022); Perolat et al (2015); Wei et al (2021); Yang and Ma (2022); Zeng et al (2022); Zhang et al (2022); Zhao et al (2022). A natural question is:…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Can We Find Nash Equilibria at a Linear Rate in Markov Games?

Song¹,

Lee²,

Yang³

2023

Preprint

View full text Add to dashboard Cite

We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. First, the player does not need to know the policy of the opponent to update its policy. Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game. To this end, we construct a meta algorithm, dubbed as Homotopy-PO, which provably finds a Nash equilibrium at a global linear rate. In particular, Homotopy-PO interweaves two base algorithms Local-Fast and Global-Slow via homotopy continuation. Local-Fast is an algorithm that enjoys local linear convergence while Global-Slow is an algorithm that converges globally but at a slower sublinear rate. By switching between these two base algorithms, Global-Slow essentially serves as a "guide" which identifies a benign neighborhood where Local-Fast enjoys fast convergence. However, since the exact size of such a neighborhood is unknown, we apply a doubling trick to switch between these two base algorithms. The switching scheme is delicately designed so that the aggregated performance of the algorithm is driven by Local-Fast. Furthermore, we prove that Local-Fast and Global-Slow can both be instantiated by variants of optimistic gradient descent/ascent (OGDA) method, which is of independent interest.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Global Convergence and Geometric Boundedness Of Averaging Ogdamentioning

confidence: 99%