2017
DOI: 10.48550/arxiv.1702.03849
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

Abstract: Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular nonconvex objectives (Gelfand and Mitter, 1991).The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
123
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(130 citation statements)
references
References 26 publications
7
123
0
Order By: Relevance
“…where ξ k is a unit Gaussian random vector in R d , η is the step size and β is an inverse temperature parameter. This update rule is known as stochastic gradient Langevin dynamics (SGLD) (Welling & Teh, 2011) and has been widely studied both in theory and practice (Raginsky et al, 2017;Zhang et al, 2017). Intuitively, (3.2) is an Euler discretization of the continuous-time diffusion equation:…”
Section: Preliminariesmentioning
confidence: 99%
See 2 more Smart Citations
“…where ξ k is a unit Gaussian random vector in R d , η is the step size and β is an inverse temperature parameter. This update rule is known as stochastic gradient Langevin dynamics (SGLD) (Welling & Teh, 2011) and has been widely studied both in theory and practice (Raginsky et al, 2017;Zhang et al, 2017). Intuitively, (3.2) is an Euler discretization of the continuous-time diffusion equation:…”
Section: Preliminariesmentioning
confidence: 99%
“…Once F is shown to be dissipative, the machinery of (Raginsky et al, 2017;Zhang et al, 2017;Zou et al, 2020) can be adapted to show that the convergence of CS-SGLD. The majority of the remainder of the paper is devoted to proving this series of technical claims.…”
Section: Preliminariesmentioning
confidence: 99%
See 1 more Smart Citation
“…To solve (10), we apply a stochastic gradient descent (SGD) algorithm. In general, SGD finds a local minimum of a nonconvex objective function (Ge et al, 2015) but a global minimizer in some special situations (Raginsky et al, 2017;.…”
Section: Theoretical Examplementioning
confidence: 99%
“…There is a rich field in the literature on discretizing Langevin diffusion [Dalalyan, 2016, Raginsky et al, 2017, Dalalyan, 2017 primarily used for designing efficient sampling algorithms -a large fraction of them analyzing the above discretization. We will mostly rely on tools from Bubeck et al [2018].…”
Section: D1 Bounding the Error In The Euler-mariyama Discretizationmentioning
confidence: 99%