2018
DOI: 10.1073/pnas.1806579115
|View full text |Cite
|
Sign up to set email alerts
|

A mean field view of the landscape of two-layer neural networks

Abstract: SignificanceMultilayer neural networks have proven extremely successful in a variety of tasks, from image classification to robotics. However, the reasons for this practical success and its precise domain of applicability are unknown. Learning a neural network from data requires solving a complex optimization problem with millions of variables. This is done by stochastic gradient descent (SGD) algorithms. We study the case of two-layer networks and derive a compact description of the SGD dynamics in terms of a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

23
663
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 476 publications
(699 citation statements)
references
References 18 publications
23
663
0
2
Order By: Relevance
“…We take n = 10 noisy measurements of this signal. Example 5 is directly inspired by feature models 17 in several recent papers [7,8,52], as well as having philosophical connections to other recent papers [6,46]. Figure 3 shows the performance of the minimum-2 -norm interpolator on Example 5 as we increase the number of features.…”
Section: The Minimum-2 -Norm Interpolator Through the Fourier Lensmentioning
confidence: 98%
See 1 more Smart Citation
“…We take n = 10 noisy measurements of this signal. Example 5 is directly inspired by feature models 17 in several recent papers [7,8,52], as well as having philosophical connections to other recent papers [6,46]. Figure 3 shows the performance of the minimum-2 -norm interpolator on Example 5 as we increase the number of features.…”
Section: The Minimum-2 -Norm Interpolator Through the Fourier Lensmentioning
confidence: 98%
“…≥ 3-layer neural networks.Promising progress has been made in all of these areas, which we recap only briefly below. Regarding the first point, while the optimization landscape for deep neural networks is non-convex and complicated, several independent recent works (an incomplete list is [12][13][14][15][16][17][18]) have shown that overparameterization can make it more attractive, in the sense that optimization algorithms like stochastic gradient descent (SGD) are more likely to actually converge to a global minimum. These interesting insights are mostly unrelated to the question of generalization, and should be viewed as a coincidental benefit of overparameterization.Second, a line of recent work [19][20][21][22] characterizes the inductive biases of commonly used optimization algorithms, thus providing insight into the identity of the global minimum that is selected.…”
mentioning
confidence: 99%
“…We show that the neural network output in the large hidden-units and large SGD-iterates limit depends on paths of representative weights that go from input to output layer. This result is then used to show that, under suitable assumptions, the limit neural network seeks to minimize the limit objective function and achieve zero loss.Recently, law of large numbers and central limit theorems have been established for neural networks with a single hidden layer [10,30,43,48,49,50]. For a single hidden layer, one can directly study the weak convergence of the empirical measure of the parameters.…”
mentioning
confidence: 99%
“…Hence one important question for approximation of a function is the stability of this process [HR17]. Another approach is to develop the meanfield equations to approximate the function space with a fewer equations that are easier to handle [MMN18]. Poggio et al have found a bound for the complexity of a neural networks with smooth (e.g.…”
Section: -15 Cone Bipolars 1 Beta Ganglion Cellmentioning
confidence: 99%