2020
DOI: 10.1109/tnnls.2019.2952219
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions

Abstract: Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to veri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
57
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 73 publications
(59 citation statements)
references
References 19 publications
2
57
0
Order By: Relevance
“…Our goal here is to move towards a more general theory of convergence that combines all of these threads under a single analysis framework. Specifically, by innovating on the strategies of Lei et al [2019] and Patel [2020], we will prove the following results for SGD with matrix-valued learning rates, which we state informally now and formalize later.…”
Section: Contributionsmentioning
confidence: 98%
See 2 more Smart Citations
“…Our goal here is to move towards a more general theory of convergence that combines all of these threads under a single analysis framework. Specifically, by innovating on the strategies of Lei et al [2019] and Patel [2020], we will prove the following results for SGD with matrix-valued learning rates, which we state informally now and formalize later.…”
Section: Contributionsmentioning
confidence: 98%
“…with α ∈ (0, 1], the objective function, F , evaluated at the SGD iterates converges almost surely to a bounded random variable. Moreover, Lei et al [2019] show that, when α = 1, the expected value of the norm of the gradient function, Ḟ , evaluated at the SGD iterates converges to zero. 3.…”
Section: Introductionmentioning
confidence: 96%
See 1 more Smart Citation
“…It is usually solved in practice using toolboxes as Scikit-Learn or TensorFlow, by means of a stochastic gradient descent method for which a full convergence proof under realistic assumptions are still unknown in our knowledge. See Bottou et al [2018] or E et al [2020] for recent in depth reviews of these subjects and Ghadimi and Lan [2013], Lei et al [2019], Fehrman et al [2020] for results for a non convex function.…”
Section: Description Of the Algorithmmentioning
confidence: 99%
“…There are four major approaches to handles nonconvex optimization problems. The first groups of methods is based on stochastic gradient descent and its convergence rates are far from other methods as is described in (Lei et al 2020). The second approach is to see noncovex functions as a difference of some convex functions but this framework does not work for any class of nonconvex problems.…”
Section: Introductionmentioning
confidence: 99%