2022
DOI: 10.48550/arxiv.2205.09653
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

Abstract: We analyze feature learning in infinite width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 42 publications
(70 reference statements)
1
4
0
Order By: Relevance
“…Overall, leveraging the large size of the network involved in PL, we have developed a statistical mechanics theory of PL in deep neural networks which provides mechanistic and normative understanding of several important empirical findings of PL. This work complements recent theoretical studies of learning in deep networks [27][28][29][30][31][32][33][34][35], contributing to the understanding of learning and computation in these important architectures.…”
Section: Introductionsupporting
confidence: 65%
See 1 more Smart Citation
“…Overall, leveraging the large size of the network involved in PL, we have developed a statistical mechanics theory of PL in deep neural networks which provides mechanistic and normative understanding of several important empirical findings of PL. This work complements recent theoretical studies of learning in deep networks [27][28][29][30][31][32][33][34][35], contributing to the understanding of learning and computation in these important architectures.…”
Section: Introductionsupporting
confidence: 65%
“…In the present work, we directly addressed the issue of PL in a deep network by studying PL of a fine discrimination task in a deep neural network (DNN) model of the sensory hierarchy [24][25][26]. As learning dynamics in DNNs are in general challenging to study [27][28][29][30][31][32][33][34][35], we developed a mean-field theory of information propagation in the model at the limit of large numbers of neurons in every layer and large number of training examples. The theory reveals that during the perceptual task, the DNN effectively behaves like a deep linear neural network.…”
Section: Introductionmentioning
confidence: 99%
“…DMFT methods have been used to analyze the test loss dynamics for general linear and spiked tensor models trained with high-dimensional random data (Mannelli et al, 2019;Mignacco et al, 2020;Mignacco & Urbani, 2022) and deep networks dynamics with random initialization (Bordelon & Pehlevan, 2022b;Bordelon et al, 2023). High dimensional limits of SGD have been analyzed with Volterra integral equations in the offline case (Paquette et al, 2021) or with recursive matrix equations in the online case (Varre et al, 2021;Bordelon & Pehlevan, 2022a).…”
Section: Related Workmentioning
confidence: 99%
“…Further analyses of the after-kernels of feature learning networks are performed in Appendix L. We see that the kernels continue to evolve substantially throughout training. This indicates that a full explanation of the compute optimal scaling exponents will require something resembling a mechanistic theory of kernel evolution (Long, 2021;Fort et al, 2020;Atanasov et al, 2022;Bordelon & Pehlevan, 2022b).…”
Section: The Role Of Feature Learningmentioning
confidence: 99%
See 1 more Smart Citation