2021
DOI: 10.1007/s11222-021-10016-8
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of stochastic gradient descent in continuous time

Abstract: Stochastic gradient descent is an optimisation method that combines classical gradient descent with random subsampling within the target functional. In this work, we introduce the stochastic gradient process as a continuous-time representation of stochastic gradient descent. The stochastic gradient process is a dynamical system that is coupled with a continuous-time Markov process living on a finite state space. The dynamical system—a gradient flow—represents the gradient descent part, the process on the finit… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
23
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 22 publications
(25 citation statements)
references
References 53 publications
1
23
0
Order By: Relevance
“…In a setting where the differential inclusions are actually differential equations and sufficiently smooth, one can sometimes show that ( x(t)) t≥0 → (x(t)) t≥0 in a weak sense, as λ → 0. We refer to [20,23] for results of this type and a general perspective on stochastic approximation in continuous time.…”
Section: Problem Setting and Motivationmentioning
confidence: 99%
See 2 more Smart Citations
“…In a setting where the differential inclusions are actually differential equations and sufficiently smooth, one can sometimes show that ( x(t)) t≥0 → (x(t)) t≥0 in a weak sense, as λ → 0. We refer to [20,23] for results of this type and a general perspective on stochastic approximation in continuous time.…”
Section: Problem Setting and Motivationmentioning
confidence: 99%
“…see [23,Lemma 5] for details. The infinitesimal generator is the transition rate matrix that we give in Subsection 1.1, it has domain…”
Section: Feller Processes and Their Generatorsmentioning
confidence: 99%
See 1 more Smart Citation
“…We assume that we do not have direct access to the gradient ∇f (x) but to a random estimate ∇f (x, ξ), where ξ ∈ Ξ is random of law P. In the continuized framework, the randomness of the stochastic gradient and its time mix in a particularly convenient way. For similar reasons, Latz studied stochastic gradient descent as a gradient flow on a random function that is regenerated at a Poisson rate Latz [2021]. However, this approach has the same shortcomings as the other approaches based on gradient flows: the subsequent discretization introduces non-trivial errors.…”
Section: Continuized Nesterov Acceleration Of Stochastic Gradient Des...mentioning
confidence: 99%
“…Another set of related literature is on the diffusion approximation of SGD (Li, Tai and Weinan, 2017;Feng, Li and Liu, 2017;Yang, Hu and Li, 2021;Sirignano and Spiliopoulos, 2020;Latz, 2021). Authors aim to approximate the trajectory of SGD by a diffusion process which solves an SDE.…”
mentioning
confidence: 99%