2019
DOI: 10.1103/physreve.100.012115
|View full text |Cite
|
Sign up to set email alerts
|

Jamming transition as a paradigm to understand the loss landscape of deep neural networks

Abstract: Deep learning has been immensely successful at a variety of tasks, ranging from classification to artificial intelligence. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an analogy between this landscape and the energy… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

12
106
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 94 publications
(120 citation statements)
references
References 52 publications
12
106
0
Order By: Relevance
“…(The example 5 in Section 4 is just a caricature in that direction.) In the empirical double descent papers [37,38], the second descent of the test MSE in the overparameterized regime is strictly better than the first descent in the underparameterized regime. From our concrete understanding of the correctly specified high-dimensional regime, it is clear that this is only possible in the presence of an approximation-theoretic benefit of adding more features into the model.…”
Section: Future Directionsmentioning
confidence: 99%
See 2 more Smart Citations
“…(The example 5 in Section 4 is just a caricature in that direction.) In the empirical double descent papers [37,38], the second descent of the test MSE in the overparameterized regime is strictly better than the first descent in the underparameterized regime. From our concrete understanding of the correctly specified high-dimensional regime, it is clear that this is only possible in the presence of an approximation-theoretic benefit of adding more features into the model.…”
Section: Future Directionsmentioning
confidence: 99%
“…Most recently, a double-descent curve on the test error (0 − 1 loss and MSE) as a function of the number of parameters of several parametric models was observed on several common datasets by physicists [37] and machine learning researchers [38] respectively. In these experiments, the minimum 2 -norm interpolating solution is used, and several feature families, including kernel approximators [39], were considered.…”
Section: High-dimensional Linear Regressionmentioning
confidence: 99%
See 1 more Smart Citation
“…HT-MU applies to the analysis of complicated systems, including many physical systems, traditional NNs [23,24], and even models of the dynamics of actual spiking neurons. Indeed, the dynamics of learning in DNNs seems to resemble a system near a phase transition, such as the phase boundary of spin glass, or a system displaying Self Organized Criticality (SOC), or a Jamming transition [25,26]. Of course, we can not say which mechanism, if any, is at play.…”
Section: Introductionmentioning
confidence: 99%
“…Understanding the nature of such glass transitions and jamming is a fundamental problem in CSPs since it is intimately related to efficiency of algorithms to solve CSPs. In the context of DNN, it is certainly important to understand the characteristics of the free-energy landscape to understand the efficiently of various learning algorithms for DNNs [18][19][20].…”
mentioning
confidence: 99%