2021
DOI: 10.1137/19m1308943
|View full text |Cite
|
Sign up to set email alerts
|

Global Minima of Overparameterized Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(23 citation statements)
references
References 1 publication
0
18
0
Order By: Relevance
“…The Embedding Principle provides a structural mechanism underlying the degeneracy as a very common property for critical points (Choromanska et al, 2015;Sagun et al, 2016). Thus it complements the understanding that global minima of NNs typically form a high dimensional manifold due to over-parameterization (Cooper, 2021).…”
Section: Related Workmentioning
confidence: 70%
See 1 more Smart Citation
“…The Embedding Principle provides a structural mechanism underlying the degeneracy as a very common property for critical points (Choromanska et al, 2015;Sagun et al, 2016). Thus it complements the understanding that global minima of NNs typically form a high dimensional manifold due to over-parameterization (Cooper, 2021).…”
Section: Related Workmentioning
confidence: 70%
“…We show that the degeneracy of a critical point substantially increases when it is embedded to a wider network, due to the fact that a critical point can be mapped to a high-dimensional critical submanifold through a class of critical embeddings. This degeneracy of critical points arises from the neuron redundancy of the wide NN in representing certain simple critical functions from narrower NNs, which is different from over-parameterization induced degeneracy studied in Cooper (2021). We also study the property of Hessian of critical points through critical embedding, e.g., the number of its negative eigenvalues, which determines whether the corresponding critical point is a strict-saddle that enables easy optimization .…”
Section: Introductionmentioning
confidence: 99%
“…Due to its importance for the understanding of the behavior, performance, and limitations of machine learning algorithms, the study of the loss landscape of training problems for artificial neural networks has received considerable attention in the last years. Compare, for instance, with the early works [3,6,34] on this topic, with the contributions on stationary points and plateau phenomena in [1,9,15,17,50], with the results on suboptimal local minima and valleys in [11,19,24,37,41,48,52], and with the overview articles [5,45,46]. For fullyconnected feedforward neural networks involving activation functions with an affine segment, much of the research on landscape properties was initially motivated by the observation of Kawaguchi [30] that networks with linear activation functions give rise to learning problems that do not possess spurious (i.e., not globally optimal) local minima and thus behave -at least as far as the notion of local optimality is concernedlike convex problems.…”
mentioning
confidence: 83%
“…Before we demonstrate that the effects discussed in Theorems 3.1 and 3.2 and Corollary 5.3 can indeed affect the behavior of gradient-based optimization algorithms in practice, we would like to point out that the "space-filling" cases cl Z (ι(Ψ(D))) = Z and cl L p µ (K) (Ψ(D)) = L p µ (K) in Corollaries 5.1 to 5.3 are not as pathological as one might think at first glance. In fact, in many applications, neural networks are trained in an "overparameterized" regime in which the number of degrees of freedom in ψ exceeds the number of training samples by far and in which ψ is able to fit arbitrary training data with zero error, see [2,8,15,32,39]. In the situation of Lemma 3.3, this means that a measure µ of the form µ = 1 n n k=1 δ x k supported on a finite set…”
Section: Further Consequences Of the Nonexistence Of Supporting Half-...mentioning
confidence: 99%
“…The depth of circuits is used to determine the first updated parameter not to be lodged in a barren plateaus problem during training. Recently, it has investigated that the barren plateau is missing in QNNs and QCNNs with tree tensor network (TTN) architecture [25,26].…”
Section: Introductionmentioning
confidence: 99%