2018
DOI: 10.48550/arxiv.1804.10200
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The loss landscape of overparameterized neural networks

Abstract: We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from R n to R -in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has n parameters and is trained on d data points, with n > d, we show that the locus M of globa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
26
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 26 publications
(27 citation statements)
references
References 0 publications
1
26
0
Order By: Relevance
“…Later this phenomanon is explained under generic assumptions by Kuditipudi et al (2019). Moreover, it has been proved that the local/global minimizers of an overparametrized network form a low-dimensional manifold (Cooper, 2018;2020) which possibly has many components. Fehrman et al (2020) proved the convergence rate of SGD to the manifold of local minimizers starting in a small neighborhood.…”
Section: Related Workmentioning
confidence: 98%
“…Later this phenomanon is explained under generic assumptions by Kuditipudi et al (2019). Moreover, it has been proved that the local/global minimizers of an overparametrized network form a low-dimensional manifold (Cooper, 2018;2020) which possibly has many components. Fehrman et al (2020) proved the convergence rate of SGD to the manifold of local minimizers starting in a small neighborhood.…”
Section: Related Workmentioning
confidence: 98%
“…Another practical motivation for studying mode connectivity is to find better optima on the curve or through some ensemble technique. On the theory side, [77] proves that the locus of global minima of an overparameterized NN is a "connected submanifold". Another paper [78] studies a more general property on the connectivity of "sublevel sets" for deep linear NNs and one-hidden-layer ReLU networks.…”
Section: Related Workmentioning
confidence: 99%
“…Concerning the training of neural networks via SGD we refer the reader to [BM11] [JW20]. Related target functions (loss landscapes) are analysed in [Coo18], [Ngu19], [Coo20], [PRV20] and [QZX20].…”
Section: Introductionmentioning
confidence: 99%