2020
DOI: 10.4310/cms.2020.v18.n1.a7
|View full text |Cite
|
Sign up to set email alerts
|

Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation

Abstract: Diffusion approximation provides weak approximation for stochastic gradient descent algorithms in a finite time horizon. In this paper, we introduce new tools motivated by the backward error analysis of numerical stochastic differential equations into the theoretical framework of diffusion approximation, extending the validity of the weak approximation from finite to infinite time horizon. The new techniques developed in this paper enable us to characterize the asymptotic behavior of constant-step-size SGD alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…If F = 0 we obtain an equation in the original time with a higher order term, which is reminiscent of the stochastic modified equations (cf. [4,8])…”
Section: Hilbert Expansion and Stochastic Modified Equationsmentioning
confidence: 99%
“…If F = 0 we obtain an equation in the original time with a higher order term, which is reminiscent of the stochastic modified equations (cf. [4,8])…”
Section: Hilbert Expansion and Stochastic Modified Equationsmentioning
confidence: 99%
“…However, implicit regularization and backward error analysis has not been explored. Backward error analysis was used [54,55] to study stochastic gradient descent in the context of stochastic differential equations and diffusion equations for the study of convergence and adaptive learning schemes, but, to the best of our knowledge, it has not been used to explore implicit regularization in gradient descent.…”
Section: Related Workmentioning
confidence: 99%
“…Our motivations, contributions and methods. It was recently discovered in [44,43,29,31,55,36,27,60,30,14] that SGD algorithms can be (weakly) approximated by continuous time SDEs. These SDEs often offer much needed insight to the algorithms under considerations, for instance, the continuous time treatment allows applications of stochastic control theory to develop novel adaptive algorithms [64,66].…”
Section: Introductionmentioning
confidence: 99%