Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Wei, Chen-Yu; Lee, Chung-Wei; Zhang, Mengxiao; Luo, Haipeng

doi:10.48550/arxiv.2102.04540

Cited by 8 publications

(12 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is unclear if the learning dynamics converge to any equilibrium when both agents apply it 1 . Contemporaneously, Wei et al [2021] presented an interesting optimistic variant of the gradient descent-ascent method, with a strong guarantee of last-iterate convergence rates, which shares all the desired properties as our algorithm. The algorithm is delicately designed and different from the common value/policy-based RL update rules, e.g., Qlearning, as in our work.…”

Section: Decentralized Multi-agent Learningmentioning

confidence: 97%

“…For example, agents always play the (smoothed) best response consistent with their selfinterested decision-making, contrary to being coordinated to keep playing the same strategy within certain time intervals as in Arslan and Yuksel [2017] and Wei et al [2021].…”

Section: Contributionsmentioning

confidence: 99%

“…For general Markov games, however, it is known that blindly applying independent/decentralized Q-learning can easily diverge, due to the non-stationarity of the environment [Tan, 1993, Boutilier, 1996, Matignon et al, 2012. Despite this, the decentralized paradigm has still attracted continuing research interest [Arslan and Yuksel, 2017, Pérolat et al, 2018, Daskalakis et al, 2020, Tian et al, 2020, Wei et al, 2021, since it is much more scalable and natural for agents to implement. Notably, these works are not as decentralized and as general as our algorithm.…”

Section: Decentralized Multi-agent Learningmentioning

confidence: 99%

“…Two recent works Tian et al [2020], Wei et al [2021] studied the decentralized setting that is closest to ours. Tian et al [2020] focused on the exploration aspect for finite-horizon settings, and considered a weak notion of regret.…”

Section: Decentralized Multi-agent Learningmentioning

confidence: 99%

See 3 more Smart Citations

Decentralized Q-Learning in Zero-sum Markov Games

Sayin¹,

Zhang²,

Leslie³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts their policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.

show abstract

Section: Decentralized Multi-agent Learningmentioning

confidence: 97%

Section: Contributionsmentioning

confidence: 99%

Section: Decentralized Multi-agent Learningmentioning

confidence: 99%

Section: Decentralized Multi-agent Learningmentioning

confidence: 99%

See 2 more Smart Citations

Decentralized Q-Learning in Zero-sum Markov Games

Sayin¹,

Zhang²,

Leslie³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Markov Game (MG), also known as stochastic game [42], is a popular model in multiagent RL [28]. Early works have mainly focused on finding Nash equilibria of MGs under strong assumptions, such as known transition and reward [29,17,15,53], or certain reachability conditions [52,54] (e.g., having access to simulators [20,43,58]) that alleviate the challenge in exploration.…”

Section: Related Workmentioning

confidence: 99%

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

Jin¹,

Liu²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms-V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max i∈[m] A i , where A i is the number of actions for the i th player. This is in sharp contrast to the size of the joint action space which is m i=1 A i . V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.

show abstract

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Jiang

Lee

Tan

et al. 2022

AAAI

View full text Add to dashboard Cite

We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. MDPGT provably achieves the best available sample complexity of O(N -1 e -3) for converging to an e-stationary point of the global average of N local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance e is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.

show abstract

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Cited by 8 publications

References 24 publications

Decentralized Q-Learning in Zero-sum Markov Games

Decentralized Q-Learning in Zero-sum Markov Games

V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Contact Info

Product

Resources

About