The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.48550/arxiv.2204.11417
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

Abstract: In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after T repetitions of the game is bounded by O(log T ), improving over the prior best bounds of O(log 4 (T )). At the same time, we guarantee optimal O( √ T ) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(22 citation statements)
references
References 15 publications
1
21
0
Order By: Relevance
“…The O( √ XAT ) trigger regret asserted in Theorem 7 improves over Theorem 6 by a factor of Π 1 , and matches the information-theoretic lower bound up to poly(H) and log factors 3 . By the online-to-batch conversion (Appendix B.2), Theorem 7 also implies an O(H 4 XA/ε 2 ) sample complexity for learning EFCE under bandit feedback (assuming same game sizes for all m players).…”
Section: Resultssupporting
confidence: 65%
See 1 more Smart Citation
“…The O( √ XAT ) trigger regret asserted in Theorem 7 improves over Theorem 6 by a factor of Π 1 , and matches the information-theoretic lower bound up to poly(H) and log factors 3 . By the online-to-batch conversion (Appendix B.2), Theorem 7 also implies an O(H 4 XA/ε 2 ) sample complexity for learning EFCE under bandit feedback (assuming same game sizes for all m players).…”
Section: Resultssupporting
confidence: 65%
“…Two important special cases of Φ-regret are the internal regret and swap regret in normal-form games [43,8]. A recent line of work developed algorithms with O(polylogT ) swap regret bound in normal-form games [2,3].…”
Section: Related Workmentioning
confidence: 99%
“…Such last-iterate convergence results for adaptive methods are relative rare in the literature, and most of them assume perfect oracle feedback. To the best of our knowledge, the closest antecedents to our result are [2,39], but both works make the more stringent cocoercive assumptions and consider adaptive learning rate that is the same for all the players. In particular, their learning rates are computed with global feedback and are thus less suitable for the learning-in-game setup.…”
Section: Compared To Theorem 5 We Can Now Only Boundmentioning
confidence: 93%
“…This is made possible thanks to a clear distinction between additive and multiplicative noise; the latter has only been formerly explored in the game-theoretic context by [4,39] for the class of cocoercive games. 2 Relaxing the cocoercivity assumption is a nontrivial challenge, as testified by the few number of works that establish last-iterate convergence results of stochastic algorithms for monotone games. Except for [27] mentioned above, this was achieved either through mini-batching [10,30], Tikhonov regularization / Halpen iteration [37], or both [11].…”
Section: B Further Related Workmentioning
confidence: 99%
See 1 more Smart Citation