2022
DOI: 10.48550/arxiv.2206.02640
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Abstract: This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…A classical averaging stepsize from Jin et al (2018) is utilized by the critic so that the errors accumulate slowly and last-iterate convergence is obtained. Zhang et al (2022) propose a modified OFTRL method, where the min-player and the max-players employ a lower and upper bound for value functions separately. The lower and upper bounds are computed from approximate Q-functions in past iterations.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A classical averaging stepsize from Jin et al (2018) is utilized by the critic so that the errors accumulate slowly and last-iterate convergence is obtained. Zhang et al (2022) propose a modified OFTRL method, where the min-player and the max-players employ a lower and upper bound for value functions separately. The lower and upper bounds are computed from approximate Q-functions in past iterations.…”
Section: Related Workmentioning
confidence: 99%
“…Global convergence. The proof for global convergence of Averaging OGDA adapts several standard techniques from Markov games Zhang et al (2022); Wei et al (2021). We attach its proof in Appendix C.1 for completeness.…”
Section: Global Convergence and Geometric Boundedness Of Averaging Ogdamentioning
confidence: 99%
See 1 more Smart Citation