2021
DOI: 10.48550/arxiv.2111.15527
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Abstract: We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 25 publications
0
4
0
Order By: Relevance
“…In general, without global assumptions on the objective function such as convexity, gradient-based methods may converge to non-global local minima or saddle points. It therefore becomes important to analyze critical points of the objective function in the training of ANNs and we refer, for example, to [14,65,68,73,74] for articles which study the appearance of critical points in the risk landscape in the training of ANNs. The question under which conditions gradientbased optimization algorithms cannot converge to saddle points was investigated, for example, in [32,48,49,58,59].…”
Section: Introduction and Main Resultsmentioning
confidence: 99%
“…In general, without global assumptions on the objective function such as convexity, gradient-based methods may converge to non-global local minima or saddle points. It therefore becomes important to analyze critical points of the objective function in the training of ANNs and we refer, for example, to [14,65,68,73,74] for articles which study the appearance of critical points in the risk landscape in the training of ANNs. The question under which conditions gradientbased optimization algorithms cannot converge to saddle points was investigated, for example, in [32,48,49,58,59].…”
Section: Introduction and Main Resultsmentioning
confidence: 99%
“…In general, without global assumptions on the objective function such as convexity, gradient-based methods may converge to non-global local minima or saddle points. It therefore becomes important to analyze critical points of the objective function in the training of ANNs and we refer, for example, to [14,59,61,65,66] for articles which study the appearance of critical points in the risk landscape in the training of ANNs. The question under which conditions gradient-based optimization algorithms cannot converge to saddle points was investigated, for example, in [28,42,43,52,53].…”
Section: Introduction and Main Resultsmentioning
confidence: 99%
“… 65). Furthermore, observe that (3.62) and(3.64) prove that for all k ∈ N it holds that Z 2 (k) ∈ (Df )(Z 1 (k)).…”
mentioning
confidence: 92%
“…Kawaguchi [22] and Laurent & von Brecht [24]). For a connection between the critical points of the risk function and the critical points of the risk function with regard to a larger network width we refer to the articles Zhang et al [36,37].…”
Section: Introductionmentioning
confidence: 99%