2021
DOI: 10.48550/arxiv.2110.06914
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

Abstract: Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function L can form a manifold. Intuitively, with a sufficiently small learning rate η, SGD tracks Gradient Descent (GD) until it gets close to such manifold, where the gradient noise prevents further convergence. In such a regime, Blanc et al. ( 2020) proved that SGD with label noise locally decreases a regularizer-li… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(15 citation statements)
references
References 30 publications
3
12
0
Order By: Relevance
“…which recovers the results in [15]. This quasistatic approach can be easily adapted to other types of noise and other optimizers such as SGD momentum.…”
Section: What Happens After the Edge Of Stabilitysupporting
confidence: 56%
See 3 more Smart Citations
“…which recovers the results in [15]. This quasistatic approach can be easily adapted to other types of noise and other optimizers such as SGD momentum.…”
Section: What Happens After the Edge Of Stabilitysupporting
confidence: 56%
“…Remark 1. The idea of flatness driven motion along the manifold is similar to that in [15], but our result is essentially different. We treat GD instead of SGD, and in our case, the motion along the manifold is made possible by the subquadratic landscape around the minima, instead of the SGD noise.…”
Section: What Happens After the Edge Of Stabilitymentioning
confidence: 54%
See 2 more Smart Citations
“…The mechanism of SGD's exploration among different minima is made clear in the recent work [14], which characterizes the movement of SGD iterators along the minima manifold. This picture of exploration along minima manifolds suits the neural network problem better than the exploration among isolated minima.…”
Section: Related Workmentioning
confidence: 99%