Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

Ioannis, Anagnostides,; Farina, Gabriele; Kroer, Christian; Lee, Chung-Wei; Luo, Haipeng; Sandholm, Tüomas

doi:10.48550/arxiv.2204.11417

Cited by 4 publications

(22 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The O( √ XAT ) trigger regret asserted in Theorem 7 improves over Theorem 6 by a factor of Π 1 , and matches the information-theoretic lower bound up to poly(H) and log factors 3 . By the online-to-batch conversion (Appendix B.2), Theorem 7 also implies an O(H 4 XA/ε 2 ) sample complexity for learning EFCE under bandit feedback (assuming same game sizes for all m players).…”

Section: Resultssupporting

confidence: 65%

See 1 more Smart Citation

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Jin¹,

Song²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the Φ-Hedge algorithm-A generic algorithm capable of learning a large class of equilibria for NFGs. We show that Φ-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the Φ-Hedge algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp O( √ XAT ) EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.

show abstract

Section: Resultssupporting

confidence: 65%

“…Two important special cases of Φ-regret are the internal regret and swap regret in normal-form games [43,8]. A recent line of work developed algorithms with O(polylogT ) swap regret bound in normal-form games [2,3].…”

Section: Related Workmentioning

confidence: 99%

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Jin¹,

Song²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Such last-iterate convergence results for adaptive methods are relative rare in the literature, and most of them assume perfect oracle feedback. To the best of our knowledge, the closest antecedents to our result are [2,39], but both works make the more stringent cocoercive assumptions and consider adaptive learning rate that is the same for all the players. In particular, their learning rates are computed with global feedback and are thus less suitable for the learning-in-game setup.…”

Section: Compared To Theorem 5 We Can Now Only Boundmentioning

confidence: 93%

“…This is made possible thanks to a clear distinction between additive and multiplicative noise; the latter has only been formerly explored in the game-theoretic context by [4,39] for the class of cocoercive games. 2 Relaxing the cocoercivity assumption is a nontrivial challenge, as testified by the few number of works that establish last-iterate convergence results of stochastic algorithms for monotone games. Except for [27] mentioned above, this was achieved either through mini-batching [10,30], Tikhonov regularization / Halpen iteration [37], or both [11].…”

Section: B Further Related Workmentioning

confidence: 99%

“…Because of this mechanism -and the fact that players are changing their actions incrementally from one round to the nextthe learners are facing a much more "predictable" sequence of events. As a result, there has been a number of research threads in the literature showing that it is possible to attain near-constant regret (i.e., at most polylogarithmic) in different classes of games, from the works of [14,33] on finite two-player zero-sum games, to more recent works on general-sum finite games [1,2,16], extensive form games [20], and even continuous games [28].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Hsieh¹,

Antonakopoulos²,

Cevher³

et al. 2022

Preprint

View full text Add to dashboard Cite

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separationthat is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that smoothly interpolates between worst-and best-case regret guarantees.

show abstract

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Lancewicki¹,

Sherman²,

Koren³

et al. 2022

Preprint

View full text Add to dashboard Cite

An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.

show abstract

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

Cited by 4 publications

References 15 publications

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

Efficient $Φ$-Regret Minimization in Extensive-Form Games via Online Mirror Descent

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Contact Info

Product

Resources

About