2021
DOI: 10.48550/arxiv.2105.02062
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Understanding Short-Range Memory Effects in Deep Neural Networks

Abstract: Stochastic gradient descent (SGD) is of fundamental importance in deep learning. Despite its simplicity, elucidating its efficacy remains challenging. Conventionally, the success of SGD is attributed to the stochastic gradient noise (SGN) incurred in the training process. Based on this general consensus, SGD is frequently treated and analyzed as the Euler-Maruyama discretization of a stochastic differential equation (SDE) driven by either Brownian or Lévy stable motion. In this study, we argue that SGN is neit… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 32 publications
0
1
0
Order By: Relevance
“…There are two sets of results that are of relevance to the problem discussed in this work. The first set of results are escape times for fBM which are studied by [28,4,41,45,47], but they are restricted to simple 1-dimensional processes, and they are often applicable to specific problems such as Kramer's problem, instead of more general problems of interest. Extending these results to multiple dimensions is difficult, and we are not aware of any existing result.…”
Section: Related Workmentioning
confidence: 99%
“…There are two sets of results that are of relevance to the problem discussed in this work. The first set of results are escape times for fBM which are studied by [28,4,41,45,47], but they are restricted to simple 1-dimensional processes, and they are often applicable to specific problems such as Kramer's problem, instead of more general problems of interest. Extending these results to multiple dimensions is difficult, and we are not aware of any existing result.…”
Section: Related Workmentioning
confidence: 99%