2021
DOI: 10.48550/arxiv.2111.02278
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

Abstract: Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points -i.e., points where the tangent of the ReLU networ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…These norms are also connected to the implicit biases of optimization methods for training neural networks. In the context of univariate functions, the dynamics of gradient descent was shown to be biased towards (adaptive) linear or cubic spline depending on the optimization regime [24,40,45], and these results have been partially extended to the multivariate case [18]. For classification problems, the implicit bias of gradient descent was connected to a variational problem related to R-norm with margin constraints on the data [2], which we explore in our empirical study.…”
Section: Related Workmentioning
confidence: 99%
“…These norms are also connected to the implicit biases of optimization methods for training neural networks. In the context of univariate functions, the dynamics of gradient descent was shown to be biased towards (adaptive) linear or cubic spline depending on the optimization regime [24,40,45], and these results have been partially extended to the multivariate case [18]. For classification problems, the implicit bias of gradient descent was connected to a variational problem related to R-norm with margin constraints on the data [2], which we explore in our empirical study.…”
Section: Related Workmentioning
confidence: 99%
“…Mean-field analyses seek to explain the uncanny generalization abilities of DNNs by considering the distributional dynamics of idealized networks with infinitely many hidden layer neurons by considering both the stochastic equations corresponding to said dynamics, as well as Fokker-Planck-type PDEs modeling the flow of the distribution of the weights in the network. Analyses along these lines in the contemporary literature include [31,39]. Although this line of works does use limiting arguments involving stochastic and distributional dynamics, they do not directly consider the Langevin differential inclusion and the potential PDE solution for the distribution, as we do here.…”
Section: Related Workmentioning
confidence: 99%