Primal Averaging: A New Gradient Evaluation Step to Attain the Optimal Individual Convergence

Tao, Wei; Pan, Zhisong; Wu, Guangjian; Qing, Tao

doi:10.1109/tcyb.2018.2874332

Cited by 20 publications

(29 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is easy to find that the gradient operation in (11) is imposed on f (w t ), in which w t in fact is a weighted average of all past iterates. When the objective in (1) is μ-strongly convex, stochastic PA-PSG [29] is modified as ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩…”

Section: Psg In Nonsmooth Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

The Strength of Nesterov's Extrapolation2019

Qing¹

2020

Preprint

Self Cite

View full text Add to dashboard Cite

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

show abstract

Section: Psg In Nonsmooth Optimizationmentioning

confidence: 99%

“…In [29], PA-like algorithms are extended to solve regularized nonsmooth loss optimization problems in stochastic settings. Specifically, PA-PSG (11) is reformulated as ⎧ ⎨ ⎩…”

Section: Extension To Regularized Learningmentioning

confidence: 99%

The Strength of Nesterov's Extrapolation2019

Qing¹

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [29], PA-like algorithms are extended to solve regularized nonsmooth loss optimization problems in stochastic settings. Specifically, PA-PSG (11) is reformulated as ⎧ ⎨ ⎩ w + t = arg min w∈Q a t ĝ t , w +γ t B w, w + t −1 +a t r (w)…”

Section: Extension To Regularized Learningmentioning

confidence: 99%

“…It should be mentioned that the derivations of optimal individual convergence so far only focus on the DA-like methods. Motivated by the averaging step in quasi-monotone DA [22], we recently presented a primal averaging (PA) strategy for PSG [29], in which the subgradient evaluation is imposed on the average of all past iterates. The PA strategy can accelerate the individual convergence of PSG to be optimal for nonsmooth convex problems.…”

Section: Introductionmentioning

confidence: 99%

“…Based upon our convergence analysis for black-box problems, the derived algorithms are extended to regularized and stochastic settings for large scale machine learning tasks. Unlike the quasi-monotone algorithm [22] or PA-PSG [29], the subgradientlike operation in our method follows the extrapolation evaluation, which brings significant benefits in keeping the regularization structure. Our method is applicable as an efficient tool for solving large-scale l 1 -regularized hinge-loss learning problems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

Tao

Pan

et al. 2019

IEEE Trans. Neural Netw. Learning Syst.

Self Cite

View full text Add to dashboard Cite

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this article, the convergence of individual iterates of projected subgradient (PSG) methods for nonsmooth convex optimization problems is theoretically studied based on Nesterov's extrapolation, which we name individual convergence. We prove that Nesterov's extrapolation has the strength to make the individual convergence of PSG optimal for nonsmooth problems. In light of this consideration, a direct modification of the subgradient evaluation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step toward the open question about stochastic gradient descent (SGD) posed by Shamir. Furthermore, we give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in stochastic settings. Compared with other state-of-theart nonsmooth methods, the derived algorithms can serve as an alternative to the basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence. Typically, our method is applicable as an efficient tool for solving large-scale l 1 -regularized hinge-loss learning problems. Several comparison experiments demonstrate that our individual output not only achieves an optimal convergence rate but also guarantees better sparsity than the averaged solution.

show abstract