2019
DOI: 10.1088/1751-8121/ab4c8b
|View full text |Cite
|
Sign up to set email alerts
|

A jamming transition from under- to over-parametrization affects generalization in deep learning

Abstract: In this paper we first recall the recent result that in deep networks a phase transition, analogous to the jamming transition of granular media, delimits the over-and under-parametrized regimes where fitting can or cannot be achieved. The analysis leading to this result support that for proper initialization and architectures, in the whole over-parametrized regime poor minima of the loss are not encountered during training, because the number of constraints that hinders the dynamics is insufficient to allow fo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

5
79
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 86 publications
(84 citation statements)
references
References 36 publications
5
79
0
Order By: Relevance
“…Recent works suggest that the two questions above are closely connected. Numerical and theoretical studies [6,7,8,9,10,11,12,13,14,15,16,17] show that in the overparametrized regime, the loss landscape of DNNs is not rough with isolated minima as initially thought [18,19], but instead has connected level sets and presents many flat directions, even near its global minimum. In particular, recent works on the over-parametrized regime of DNNs [20,21,22,23] have shown that the landscape around a typical initialization point becomes essentially convex, allowing for convergence to a global minimum during training.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Recent works suggest that the two questions above are closely connected. Numerical and theoretical studies [6,7,8,9,10,11,12,13,14,15,16,17] show that in the overparametrized regime, the loss landscape of DNNs is not rough with isolated minima as initially thought [18,19], but instead has connected level sets and presents many flat directions, even near its global minimum. In particular, recent works on the over-parametrized regime of DNNs [20,21,22,23] have shown that the landscape around a typical initialization point becomes essentially convex, allowing for convergence to a global minimum during training.…”
Section: Introductionmentioning
confidence: 99%
“…In [16,17], it has been observed that when optimizing DNNs (using the so-called hinge loss), there is a sharp phase transition -whose location can depend on the chosen dynamics -at some N * (P ) such that for N ≥ N * the dynamic process reaches a global minimum of the loss. In particular whenever N > N * , the training error (i.e.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…HT-MU applies to the analysis of complicated systems, including many physical systems, traditional NNs [23,24], and even models of the dynamics of actual spiking neurons. Indeed, the dynamics of learning in DNNs seems to resemble a system near a phase transition, such as the phase boundary of spin glass, or a system displaying Self Organized Criticality (SOC), or a Jamming transition [25,26]. Of course, we can not say which mechanism, if any, is at play.…”
Section: Introductionmentioning
confidence: 99%
“…For the peaking phenomema [2], this occurs exactly at the transition from the underparametrized to the overparametrized regime. This phenomena has regained interest in the machine learning community in the context of deep neural networks [5,6], since these models are typically overparametrized. Recently, also several new examples have been found, where in quite simple settings more data results in worse generalization performance [7,8].…”
Section: Introductionmentioning
confidence: 99%