Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

Fazel, Maryam; Ratliff, Lillian J.

doi:10.1109/cdc51059.2022.9992950

Cited by 8 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By employing a barrier-based method, Lin et al [15] improved the convergence rate for strongly monotone games from 𝑂 (1/𝑡 1/3 ) to 𝑂 (1/𝑡 1/2 ). Similar convergence rates have also been reported in [16], [17], [18]. Huang et al [19] developed two bandit learning algorithms by integrating residual pseudo-gradient estimates into singlecall extra-gradient schemes that ensure a.s. convergence to critical points of pseudo-monotone plus games.…”

Section: Introductionsupporting

confidence: 60%

“…Contributions: In this work, we develop a bandit online learning algorithm and establish the a.s. convergence of the generated sequence of play under the regularity condition that the game is merely coherent, which is broader and more general than the games investigated in [14], [15], [16], [17], [18]. The proposed algorithm leverages the optimistic mirror descent (OMD) [30], [31] and a single-call extragradient scheme as the backbone, which allows us to deal with the absence of strict coherence and reduces the query cost induced by the extra step.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Huang,

2023

IEEE Open J. Control. Syst.

View full text Add to dashboard Cite

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to path planning of self-driving vehicles. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this paper, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further prove that the generated actual sequence of play converges a.s. to a critical point if the game under study is globally merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. We also discuss the convergence properties of the proposed bandit learning algorithm in locally merely coherent games. Finally, we illustrate the validity of the proposed algorithm via two two-player minimax problems and a cognitive radio bandwidth allocation game.

show abstract

Section: Introductionsupporting

confidence: 60%

Section: Introductionmentioning

confidence: 99%

Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Huang,

2023

IEEE Open J. Control. Syst.

View full text Add to dashboard Cite

show abstract

“…The analysis of Mertikopoulos and Zhou [40] is subsequently extended by Bravo et al [13] to learning with payoff-based, "bandit feedback"-that is, when players observe only the payoff of the action that they played. At around the same time, Tatarenko and Kamgarpour [63,64] use a Tikhonov regularization approach to obtain a series of comparable results for "merely monotone" games (i.e., monotone games that are not necessarily strictly monotone), whereas more recently, Drusvyatskiy and Ratliff [21] improve the rate of convergence in strongly monotone games to O(1=T 1=2 ). Finally, in a very recent paper, Bervoets et al [9] use stochastic approximation methodologies to prove the convergence of a payoff-based, dampened gradient approximation scheme in two other classes of one-dimensional concave games: games with strategic complements and ordinal potential games with isolated equilibria.…”

Section: Related Workmentioning

confidence: 99%

Multiagent Online Learning in Time-Varying Games

et al. 2023

View full text Add to dashboard Cite

We examine the long-run behavior of multiagent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to a Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit, and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient- and payoff-based feedback—that is, when players only get to observe the payoffs of their chosen actions.

show abstract

“…Assuming bandit feedback, i.e., agents only have access to zeroth-order oracles, [8] shows Nash equilibrium convergence for strongly monotone games, which is a special class of convex games. The convergence rate of the zeroth-order method in [8] is further improved in [30] relying on the additional assumption that the Jacobian of the gradient function is Lipschitz continuous. Common in these works is that the agents perform symmetric updates using the same kind of information feedback.…”

Section: Introductionmentioning

confidence: 99%

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

Wang

Shen

Bell

et al. 2022

2022 IEEE 61st Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

This paper considers online convex games involving multiple agents that aim to minimize their own cost functions using locally available feedback. A common assumption in the study of such games is that the agents are symmetric, meaning that they have access to the same type of information or feedback. Here we lift this assumption, which is often violated in practice, and instead consider asymmetric agents; specifically, we assume some agents have access to first-order gradient feedback and others have access to the zeroth-order oracles (cost function evaluations). We propose an asymmetric feedback learning algorithm that combines the agent feedback mechanisms. We analyze the regret and Nash equilibrium convergence of this algorithm for convex games and strongly monotone games, respectively. Specifically, we show that our algorithm always performs between pure first-order and zeroth-order methods, and can match the performance of these two extremes by adjusting the number of agents with access to zeroth-order oracles. Therefore, our algorithm incorporates the pure first-order and zeroth-order methods as special cases. We provide numerical experiments on an online market problem for both deterministic and risk-averse games to demonstrate the performance of the proposed algorithm.

show abstract

Improved Rates for Derivative Free Gradient Play in Strongly Monotone Games

Cited by 8 publications

References 13 publications

Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Global and Local Convergence Analysis of a Bandit Learning Algorithm in Merely Coherent Games

Multiagent Online Learning in Time-Varying Games

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

Contact Info

Product

Resources

About