Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Roy, Abhishek; Chen, Yifang; Balasubramanian, Krishnakumar; Mohapatra, Prasant

doi:10.48550/arxiv.1912.01698

Cited by 8 publications

(11 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We refer the reader to (Daskalakis et al, 2021) for a more thorough discussion on the literature. Several recent works start considering the problem of learning over a sequence of non-stationary payoffs under different structures, including zero-sum matrix games (Cardoso et al, 2019;Fiez et al, 2021), convex-concave games (Roy et al, 2019) and strongly monotone games (Duvocelle et al, 2021). For zero-sum games, (Fiez et al, 2021) focuses on the periodic case and proves divergence results for a class of learning algorithms; (Cardoso et al, 2019) is the closest to our work, but as mentioned, we argue that their proposed measure (NE-regret) is not always appropriate (see Section 3.1).…”

Section: Measurementioning

confidence: 99%

“…Another reasonable one is the tracking error T t=1 ( x t −x * t 1 + y t −y * t 1 ) that directly measures the distance between (x t , y t ) and the equilibrium (x * t , y * t ) (assuming unique equilibrium for simplicity). This is considered in (Roy et al, 2019;Balasubramanian & Ghadimi, 2021) (for different problems). However, tracking error bounds are in fact not well studied even when A t is fixed -the best known results still depend on some problem-dependent constant that can be arbitrarily large (Daskalakis & Panageas, 2019;Wei et al, 2021).…”

Section: Duality Gapmentioning

confidence: 99%

“…In fact, a preprint byRoy et al (2019) also considers a similar measure for general convex-concave problem, but we believe that their results are incorrect. Specifically, they claim (in their Theorem 4.3) that an O(…”

mentioning

confidence: 98%

See 2 more Smart Citations

No-Regret Learning in Time-Varying Zero-Sum Games

Zhang¹,

Zhao²,

Luo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. We first present three performance measures to guide the algorithmic design for this problem: 1) the well-studied individual regret, 2) an extension of duality gap, and 3) a new measure called dynamic Nash Equilibrium regret, which quantifies the cumulative difference between the player's payoff and the minimax game value. Next, we develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under all these three performance measures. These guarantees are adaptive to different nonstationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is fixed. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property, along with several novel ingredients specifically designed for the time-varying game setting. Empirical results further validate the effectiveness of our algorithm.

show abstract

Section: Measurementioning

confidence: 99%

Section: Duality Gapmentioning

confidence: 99%

See 1 more Smart Citation

No-Regret Learning in Time-Varying Zero-Sum Games

Zhang¹,

Zhao²,

Luo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…These approaches typically opt for a tracking error metric. In the more general online saddle point problem, one seeks to find a sequence of strategy pairs that minimize a saddle point regret [28,46].…”

Section: Related Workmentioning

confidence: 99%

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Wood¹,

Dall’Anese²

2022

Preprint

View full text Add to dashboard Cite

This paper focuses on stochastic saddle point problems with decision-dependent distributions in both the static and time-varying settings. These are problems whose objective is the expected value of a stochastic payoff function, where random variables are drawn from a distribution induced by a distributional map. For general distributional maps, the problem of finding saddle points is in general computationally burdensome, even if the distribution is known. To enable a tractable solution approach, we introduce the notion of equilibrium points -which are saddle points for the stationary stochastic minimax problem that they induce -and provide conditions for their existence and uniqueness. We demonstrate that the distance between the two classes of solutions is bounded provided that the objective has a strongly-convex-strongly-concave payoff and Lipschitz continuous distributional map. We develop deterministic and stochastic primal-dual algorithms and demonstrate their convergence to the equilibrium point. In particular, by modeling errors emerging from a stochastic gradient estimator as sub-Weibull random variables, we provide error bounds in expectation and in high probability that hold for each iteration; moreover, we show convergence to a neighborhood in expectation and almost surely. Finally, we investigate a condition on the distributional map-which we call opposing mixture dominance-that ensures the objective is strongly-convex-strongly-concave. Under this assumption, we show that primal-dual algorithms converge to the saddle points in a similar fashion.

show abstract

“…For convex-concave minimax optimization problems, there are some exisiting algorithms. For example, Roy et al [47] study zeroth-order Frank-Wolfe algorithms for strongly convex-strongly concave minimax optimization problems and provide non-asymptotic oracle complexity analysis. Beznosikov et al [4] present a zeroth-order saddle-point algorithm (zoSPA) with the total complexity of O ε −2 .…”

mentioning

confidence: 99%

Zeroth-Order Alternating Randomized Gradient Projection Algorithms for General Nonconvex-Concave Minimax Problems

Xu¹,

Wang²,

Shen³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we study zeroth-order algorithms for nonconvex-concave minimax problems, which have attracted widely attention in machine learning, signal processing and many other fields in recent years. We propose a zeroth-order alternating randomized gradient projection (ZO-AGP) algorithm for smooth nonconvex-concave minimax problems, and its iteration complexity to obtain an ε-stationary point is bounded by O(ε −4 ), and the number of function value estimation is bounded by O(dxε −4 + dyε −6 ) per iteration. Moreover, we propose a zeroth-order block alternating randomized proximal gradient algorithm (ZO-BAPG) for solving block-wise nonsmooth nonconvexconcave minimax optimization problems, and the iteration complexity to obtain an ε-stationary point is bounded by O(ε −4 ) and the number of function value estimation per iteration is bounded by O(Kdxε −4 + dyε −6 ). To the best of our knowledge, this is the first time that zeroth-order algorithms with iteration complexity gurantee are developed for solving both general smooth and block-wise nonsmooth nonconvex-concave minimax problems. Numerical results on data poisoning attack problem validate the efficiency of the proposed algorithms.

show abstract

Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Cited by 8 publications

References 20 publications

No-Regret Learning in Time-Varying Zero-Sum Games

No-Regret Learning in Time-Varying Zero-Sum Games

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Zeroth-Order Alternating Randomized Gradient Projection Algorithms for General Nonconvex-Concave Minimax Problems

Contact Info

Product

Resources

About