On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Ju, Peizhong; Lin, Xiaojun; Shroff, Ness B.

doi:10.48550/arxiv.2103.05243

Cited by 2 publications

(14 citation statements)

References 14 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these studies usually do not quantify how the generalization performance depends on the the number of neurons p. Specifically, they usually provide an upper bound on the generalization error when the number of neurons p is greater than a threshold, while the upper bound itself does not depend on p. Thus, such an upper bound cannot explain the descent behavior of NTK models. The work in Ju et al [2021] does study the decent behavior with respect to p, and is therefore the closet to our work. However, as we have explained earlier, there are crucial differences between 2 and 3 layers in both the descent behavior and the learnable set of ground-truth functions.…”

Section: Introductionsupporting

confidence: 79%

“…Based on the upper bound in Theorem 3, we have the following insights for 3-layer NTK, which are similar to those for 2-layer NTK shown in Ju et al [2021]. These similarities may reveal some intrinsic properties of the NTK models regardless of the number of layers.…”

Section: Interpretations Similar To 2-layer Ntksupporting

confidence: 63%

“…In contrast, NTK models adopt features generated by non-linear activation functions (i.e., neurons of DNNs), and thus they can be viewed as an intermediate step between simple linear models and DNNs. Along this line, the work in Ju et al [2021] studies 2-layer NTK models, and shows that the 2-layer NTK model indeed exhibits better and different descent behavior in the overparameterized region, which might be closer to that of an actual neural network.…”

Section: Introductionmentioning

confidence: 76%

“…Specifically, suppose that we fix the output of the first hidden layer (i.e., h RF V,x ) and regard it as the input of a 2-layer NTK formed by the top two layers of the 3-layer neural network. By letting p 2 → ∞, we can show that the inner product between h Three V,W0,x and h Three V,W0,Xi approaches K Two ((h RF V,x ) T h RF V,Xi ) (with necessary normalization of h RF V,x and h RF V,Xi ), where K Two is exactly the kernel of 2-layer NTK in Ju et al [2021]. Second, when p 1 → ∞, we can show that (h RF V,x ) T h RF V,Xi approaches K RF (x T X i ), where K RF is exactly the kernel of the random-feature model .…”

Section: A Set Of Ground-truth Functions That May Be Learnablementioning

confidence: 99%

“…Motivated by Ju et al [2021], it is of great interest to understand whether similar insights extend to deeper NTK models. In particular, in this paper we study NTK models with 3 layers.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

Ju¹,

Lin²,

Shroff³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.

show abstract

Section: Introductionsupporting

confidence: 79%

Section: Interpretations Similar To 2-layer Ntksupporting

confidence: 63%

Section: Introductionmentioning

confidence: 76%

Section: A Set Of Ground-truth Functions That May Be Learnablementioning

confidence: 99%

“…Motivated by Ju et al [2021], it is of great interest to understand whether similar insights extend to deeper NTK models. In particular, in this paper we study NTK models with 3 layers.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations