2017
DOI: 10.48550/arxiv.1707.05947
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Abstract: Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning.In this paper, we study the generalization errors of Stochastic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(15 citation statements)
references
References 18 publications
0
15
0
Order By: Relevance
“…We study this particular step-size setting of SGLD. It has been shown by Mou et al (2017) that SGLD has the following uniform stability for L-Lipschitz convex loss function,…”
Section: Other Methods With Known Stabilitymentioning
confidence: 99%
See 2 more Smart Citations
“…We study this particular step-size setting of SGLD. It has been shown by Mou et al (2017) that SGLD has the following uniform stability for L-Lipschitz convex loss function,…”
Section: Other Methods With Known Stabilitymentioning
confidence: 99%
“…It remains unclear, however, what is the algorithmic stability of general iterative optimization algorithms. Recently, to show the effectiveness of commonly used optimization algorithms in many large-scale learning problems, algorithmic stability has been established for stochastic gradient methods (Hardt et al, 2016), stochastic gradient Langevin dynamics (Mou et al, 2017), as well as for any algorithm in situations where global minima are approximately achieved (Charles and Papailiopoulos, 2017).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, how to extend these results to non-linear neural networks remains unclear (Wei et al, 2018). Another line of algorithm-dependent analysis of generalization (Hardt et al, 2015;Mou et al, 2017;Chen et al, 2018) used stability of specific optimization algorithms that satisfy certain generic properties like convexity, smoothness, etc. However, as the number of epochs becomes large, these generalization bounds are vacuous.…”
Section: Introductionmentioning
confidence: 99%
“…From the theoretical perspective, the convergence guarantee of SGLD has been proved for both strongly log-concave distributions (Dalalyan and Karagulyan, 2017) and non-log-concave distributions (Raginsky et al, 2017;Xu et al, 2018) in 2-Wasserstein distance. Mou et al (2017) further studied the generalization performance of SGLD for nonconvex optimization. Although SGLD can drastically reduce the computational cost, it is also observed to have a slow convergence rate due to the large variance caused by the stochastic gradient .…”
Section: Introductionmentioning
confidence: 99%