Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Gorbunov, Eduard; Danilova, Marina; Shibaev, Innokentiy; Dvurechensky, Pavel; Gasnikov, Alexander

doi:10.48550/arxiv.2106.05958

Cited by 5 publications

(21 citation statements)

References 16 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This rate has a worse dependence on k than our best scheme, but has an improved dependence on δ. However, we stress that the setting of [7,8] is different from ours. Indeed in [13], which is more closely related to our proposal, the corresponding result contains also the term log(1/δ) (see Theorem 2).…”

Section: Sgdmentioning

confidence: 88%

“…Despite obtaining near-optimal rates, both works suffer from either unpractical parameter settings or unrealistic assumptions. Moreover, differently from most results obtained in the light tailed case, in [13,8] the analysis is confined to a finite horizon, which is a limitation in many practical scenarios. Indeed, finite horizon methods cannot cope with online settings in which data arrives continuously in a potentially infinite stream of batches and the predictive model is updated accordingly.…”

Section: Introductionmentioning

confidence: 95%

“…Recently, motivated by the fact that real world datasets are abundant but of poor quality, a line of research has started investigating high-probability bounds with heavy tails assumptions, that is, with uniformly bounded variance noise. High probability bounds in this setting, have been proved in [13,8]. Despite obtaining near-optimal rates, both works suffer from either unpractical parameter settings or unrealistic assumptions.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, since the gradient estimator is the same as in [13], it suffers from the same practicability issues we discussed above and the author leaves as an open problem that of identifying a more practical estimator. Clipping strategies have already been used in [7,8], which provide convergence rates in high probability. In contrast to our work, they cover the unconstrained minimization of convex Lipschitz smooth functions and convex Lipschitz continuous functions respectively.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

A.¹,

Paudice²,

Pontil³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this case the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.

show abstract

Section: Sgdmentioning

confidence: 88%

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

A.¹,

Paudice²,

Pontil³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Using Markov inequality, one can easily derive high-probability bounds with non-desirable polynomial dependence on 1 /β, e.g., see the discussion in[Davis et al, 2021, Gorbunov et al, 2021.6 Each complexity result, which we derive, relies only on one or two of these assumptions simultaneously.…”

mentioning

confidence: 99%

Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

Gorbunov¹,

Danilova²,

Dobre³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Stochastic first-order methods such as Stochastic Extragradient (SEG) or Stochastic Gradient Descent-Ascent (SGDA) for solving smooth minimax problems and, more generally, variational inequality problems (VIP) have been gaining a lot of attention in recent years due to the growing popularity of adversarial formulations in machine learning. However, while high-probability convergence bounds are known to reflect the actual behavior of stochastic methods more accurately, most convergence results are provided in expectation. Moreover, the only known high-probability complexity results have been derived under restrictive sub-Gaussian (light-tailed) noise and bounded domain Assump. [Juditsky et al., 2011a]. In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains. In the monotone case, our results match the best-known ones in the light-tails case [Juditsky et al., 2011a], and are novel for structured non-monotone problems such as negative comonotone, quasi-strongly monotone, and/or star-cocoercive ones. We achieve these results by studying SEG and SGDA with clipping. In addition, we numerically validate that the gradient noise of many practical GAN formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.

show abstract

Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise

Danilova

2023

Dokl. Math.

View full text Add to dashboard Cite

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Cited by 5 publications

References 16 publications

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

Algorithms with Gradient Clipping for Stochastic Optimization with Heavy-Tailed Noise

Contact Info

Product

Resources

About