2018
DOI: 10.48550/arxiv.1804.08838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Measuring the Intrinsic Dimension of Objective Landscapes

Abstract: Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimens… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
46
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(56 citation statements)
references
References 10 publications
1
46
0
Order By: Relevance
“…In the absence of such concrete information about the eigenvalue spectrum, many researchers have developed clever ad hoc methods to understand notions of smoothness, curvature, sharpness, and poor conditioning in the landscape of the loss surface. Examples of such work, where some surrogate is defined for the curvature, include the debate on flat vs sharp minima [16,5,29,15], explanations of the efficacy of residual connections [19] and batch normalization [25], the construction of low-energy paths between different local minima [6], qualitative studies and visualizations of the loss surface [11], and characterization of the intrinsic dimensionality of the loss [18]. In each of these cases, detailed knowledge of the entire Hessian spectrum would surely be informative, if not decisive, in explaining the phenomena at hand.…”
Section: Introductionmentioning
confidence: 99%
“…In the absence of such concrete information about the eigenvalue spectrum, many researchers have developed clever ad hoc methods to understand notions of smoothness, curvature, sharpness, and poor conditioning in the landscape of the loss surface. Examples of such work, where some surrogate is defined for the curvature, include the debate on flat vs sharp minima [16,5,29,15], explanations of the efficacy of residual connections [19] and batch normalization [25], the construction of low-energy paths between different local minima [6], qualitative studies and visualizations of the loss surface [11], and characterization of the intrinsic dimensionality of the loss [18]. In each of these cases, detailed knowledge of the entire Hessian spectrum would surely be informative, if not decisive, in explaining the phenomena at hand.…”
Section: Introductionmentioning
confidence: 99%
“…One intriguing quality of nonconvex landscapes in which the symmetry breaking principle applies is that local minima lie in fixed low-dimensional spaces (see Section 4). This phenomenon of a hidden low-dimensional structure has been observed in various learning problems in DL with real datasets [37,30], and is believed by some researchers to be an important factor of learnability in nonconvex settings. In the context of this work, the hidden low-dimensionality of spurious minima turns out to be a key ingredient to our analytic study, as we now present.…”
Section: Symmetry Breaking In Two-layer Relu Neural Networkmentioning
confidence: 78%
“…3 and Table 3). We believe this gives sDANA promise as an algorithm outside of the least squares context, in situations in which loss landscapes can range between alternately curved and very flat, frequently observed in neural network settings (see Ghorbani et al [2019], Li et al [2018], Sagun et al [2016]).…”
Section: Methodsmentioning
confidence: 93%