2022
DOI: 10.48550/arxiv.2204.01058
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Random Fully Connected Neural Networks as Perturbatively Solvable Hierarchies

Abstract: This article considers fully connected neural networks with Gaussian random weights and biases and L hidden layers, each of width proportional to a large parameter n. For polynomially bounded non-linearities we give sharp estimates in powers of 1/n for the joint correlation functions of the network output and its derivatives. Moreover, we obtain exact layerwise recursions for these correlation functions and solve a number of special cases for classes of non-linearities including ReLU and tanh. We find in both … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 44 publications
2
9
0
Order By: Relevance
“…Corrections to this lazy limit can be extracted at small but finite γ 0 . This is conceptually similar to recent works which consider perturbation series for the NTK in powers of 1/N [33,26,27] (though not identical, see Appendix N.7). We expand all observables q(γ 0 ) in a power series in γ 0 , giving q(γ 0 ) = q (0) + γ 0 q (1) + γ 2 0 q (2) + ... and compute corrections up to O(γ 2 0 ).…”
Section: Perturbation Theory In γ 0 At Infinite Widthsupporting
confidence: 84%
See 3 more Smart Citations
“…Corrections to this lazy limit can be extracted at small but finite γ 0 . This is conceptually similar to recent works which consider perturbation series for the NTK in powers of 1/N [33,26,27] (though not identical, see Appendix N.7). We expand all observables q(γ 0 ) in a power series in γ 0 , giving q(γ 0 ) = q (0) + γ 0 q (1) + γ 2 0 q (2) + ... and compute corrections up to O(γ 2 0 ).…”
Section: Perturbation Theory In γ 0 At Infinite Widthsupporting
confidence: 84%
“…, which is consistent with finite width effective field theory at γ = O N (1) [26,27] (Appendix N.6). Further, at the leading order correction, all temporal dependencies are controlled by P (P + 1) functions v α (t) = t 0 ds∆ 0 α (s) and v αβ (t) = t 0 ds∆ 0 α (s) s 0 ds ∆ 0 β (s ), which is consistent with those derived for finite width NNs using a truncation of the Neural Tangent Hierarchy [32,33,26].…”
Section: Perturbation Theory In γ 0 At Infinite Widthsupporting
confidence: 82%
See 2 more Smart Citations
“…In this setting, both the network depth d and the width n of each layer are simultaneously scaled to infinity, while their relative ratio d/n remains fixed [23,[25][26][27][28][29]. Recent work also explores using d/n as an effective perturbation parameter [30][31][32] or to study concentration bounds in terms of d/n [5,33]. This limit has the distinct advantage of being incredibly accurate at predicting the output distribution for finite size networks at initialization [27] -a significant improvement over the NNGP theory.…”
Section: Introductionmentioning
confidence: 99%