2018
DOI: 10.1007/s11590-018-1331-1
|View full text |Cite
|
Sign up to set email alerts
|

On the linear convergence of the stochastic gradient method with constant step-size

Abstract: The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size γ (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this necessary is violated up to a additive perturbation σ, we show that both the projected stochastic gradient method using a constant step-size (PSGM-CS) and the proximal stochastic gradient method exhibit linear conve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…Moreover, in practice VR methods do not tend to converge faster than SGD on over-parameterized models [19]. Indeed, recent works [83,52,5,47,13,33,73] have shown that when training over-parameterized models, classic SGD with a constant step-size and without VR can achieve the convergence rates of full-batch gradient descent. These works assume that the model is expressive enough to interpolate the data.…”
mentioning
confidence: 99%
“…Moreover, in practice VR methods do not tend to converge faster than SGD on over-parameterized models [19]. Indeed, recent works [83,52,5,47,13,33,73] have shown that when training over-parameterized models, classic SGD with a constant step-size and without VR can achieve the convergence rates of full-batch gradient descent. These works assume that the model is expressive enough to interpolate the data.…”
mentioning
confidence: 99%
“…for M = 0 we recover the uniformly bounded noise assumption. Furthermore, it has been proved that this property always holds under certain assumptions (Cevher and Vũ, 2019).…”
Section: Stochastic Noisementioning
confidence: 96%
“…Vaswani et al (2019) propose to use line-search to set the step-size while training over-parameterized models which can fit completely to data. Several other works propose to use constant learning rate for stochastic gradient methods (Ma et al, 2017;Bassily et al, 2018;Liu & Belkin, 2018;Cevher & Vũ, 2019) while training extremely expressive models which interpolate. However, all of the above mentioned works are primal-based algorithms.…”
Section: Related Workmentioning
confidence: 99%