2019
DOI: 10.48550/arxiv.1906.07842
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient Dynamics of Shallow Univariate ReLU Networks

Francis Williams,
Matthew Trager,
Claudio Silva
et al.

Abstract: We present a theoretical and empirical study of the gradient dynamics of overparameterized shallow ReLU networks with one-dimensional input, solving least-squares interpolation. We show that the gradient dynamics of such networks are determined by the gradient flow in a non-redundant parameterization of the network function. We examine the principal qualitative features of this gradient flow. In particular, we determine conditions for two learning regimes: kernel and adaptive, which depend both on the relative… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 22 publications
0
9
0
Order By: Relevance
“…Yet, in practical-sized NNs the spectrum of G t is neither constant nor it is similar to its initialization. Recent several studies explored its adaptive dynamics [4,19,20], yet most of the work was done for single or two layer NNs. Likewise, in [5,9] mathematical expressions for NTK dynamics were developed for a general NN architecture.…”
Section: Related Workmentioning
confidence: 99%
“…Yet, in practical-sized NNs the spectrum of G t is neither constant nor it is similar to its initialization. Recent several studies explored its adaptive dynamics [4,19,20], yet most of the work was done for single or two layer NNs. Likewise, in [5,9] mathematical expressions for NTK dynamics were developed for a general NN architecture.…”
Section: Related Workmentioning
confidence: 99%
“…It is interesting that even when allowing for more input arguments, the resultant learned nonlinearities still favor low-order quadratic functions (Figure 8b-d). This could be explained by an implicit bias toward smooth functions (Williams et al, 2019;Sahs et al, 2020) while still bending the input space to provide useful computations. Perhaps the learned nonlinearities are as random as possible while fulfilling these minimal conditions.…”
Section: Discussionmentioning
confidence: 99%
“…For Neural Splines, the kernel norm favors smooth functions: It is proportional to curvature ( f K ≈ f ) for 1D curves [65] and to the Radon transform of the Laplacian ( f K ≈ R{∆ 2 f } ) for 3D implicit surfaces [43,64]. While an inductive bias favoring smoothness is good for reconstructing shapes with dense samples, it is too weak a prior in more challenging cases such as when the input points are very sparse or only cover part of a shape.…”
Section: Inductive Bias Of Neural Splinesmentioning
confidence: 99%