2021
DOI: 10.48550/arxiv.2103.05243
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Abstract: In this paper, we study the generalization performance of min ℓ2-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization er… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
12
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(14 citation statements)
references
References 14 publications
(30 reference statements)
2
12
0
Order By: Relevance
“…However, these studies usually do not quantify how the generalization performance depends on the the number of neurons p. Specifically, they usually provide an upper bound on the generalization error when the number of neurons p is greater than a threshold, while the upper bound itself does not depend on p. Thus, such an upper bound cannot explain the descent behavior of NTK models. The work in Ju et al [2021] does study the decent behavior with respect to p, and is therefore the closet to our work. However, as we have explained earlier, there are crucial differences between 2 and 3 layers in both the descent behavior and the learnable set of ground-truth functions.…”
Section: Introductionsupporting
confidence: 79%
See 4 more Smart Citations
“…However, these studies usually do not quantify how the generalization performance depends on the the number of neurons p. Specifically, they usually provide an upper bound on the generalization error when the number of neurons p is greater than a threshold, while the upper bound itself does not depend on p. Thus, such an upper bound cannot explain the descent behavior of NTK models. The work in Ju et al [2021] does study the decent behavior with respect to p, and is therefore the closet to our work. However, as we have explained earlier, there are crucial differences between 2 and 3 layers in both the descent behavior and the learnable set of ground-truth functions.…”
Section: Introductionsupporting
confidence: 79%
“…Based on the upper bound in Theorem 3, we have the following insights for 3-layer NTK, which are similar to those for 2-layer NTK shown in Ju et al [2021]. These similarities may reveal some intrinsic properties of the NTK models regardless of the number of layers.…”
Section: Interpretations Similar To 2-layer Ntksupporting
confidence: 63%
See 3 more Smart Citations