“…Specifically, suppose that we fix the output of the first hidden layer (i.e., h RF V,x ) and regard it as the input of a 2-layer NTK formed by the top two layers of the 3-layer neural network. By letting p 2 → ∞, we can show that the inner product between h Three V,W0,x and h Three V,W0,Xi approaches K Two ((h RF V,x ) T h RF V,Xi ) (with necessary normalization of h RF V,x and h RF V,Xi ), where K Two is exactly the kernel of 2-layer NTK in Ju et al [2021]. Second, when p 1 → ∞, we can show that (h RF V,x ) T h RF V,Xi approaches K RF (x T X i ), where K RF is exactly the kernel of the random-feature model .…”