Feature Learning in Infinite-Width Neural Networks

Yang, Greg

doi:10.48550/arxiv.2011.14522

Cited by 31 publications

(61 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…and similarly for S source . 14 Therefore, the generating function for h, z factorizes into a product of N factors -which we shall denote Zi [j,] -allowing us to express the total average partition function in the form 15…”

Section: Self-averaging Random Networkmentioning

confidence: 99%

“…) 14 The cumulant term requires moving the summation through both the exponential and the log, i.e., ln e y i = ln e y i = ln e y i = ln e y i . 15 Note that while we require N to be sufficiently large for the Gaussian distributions to be valid, the factorization itself holds even at finite N , since it relies only on each term in the summations over z2 i , φ(hi) 2 , and ϕ(xi) 2 being identical, which is true by virtue of the integrals over hi, zi in (2.29).…”

Section: Mean-field Theory Approximationmentioning

confidence: 99%

“…This is essentially a consequence of the central limit theorem (CLT); see for example [7] for a brief summary and plentiful references of the Gaussian process limit in this context. While the infinite-width limit provides an analytically tractable approximation that has led to important progress (see, e.g., [11][12][13][14]), it fails to capture crucial aspects of real-world networks which must of necessity be of finite width. For example, the lack of interactionsi.e., intralayer correlations -in the Gaussian limit implies that representations in these idealized networks do not evolve during gradient-based learning [2].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The edge of chaos: quantum field theory and deep neural networks

Jefferson¹

2021

Preprint

View full text Add to dashboard Cite

We explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures. We first consider the mean-field theory (MFT) obtained as the leading saddlepoint in the action, and derive the condition for criticality via the largest Lyapunov exponent. We then compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth T to width N , and find a precise analogy with the well-studied O(N ) vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In particular, we compute both the O(1) corrections quantifying fluctuations from typicality in the ensemble of networks, and the subleading O(T /N ) corrections due to finite-width effects. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.

show abstract

Section: Self-averaging Random Networkmentioning

confidence: 99%

Section: Mean-field Theory Approximationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The edge of chaos: quantum field theory and deep neural networks

Jefferson¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The theory of neural tangent kernel (NTK) has been deemed an important tool to understand deep neural networks [15][16][17][18][19][20][21]. In the large-width limit, a generic neural network becomes nearly Gaussian when averaging over the initial weights and biases, and the learning capabilities become predictable.…”

Section: Introductionmentioning

confidence: 99%

Representation Learning via Quantum Neural Tangent Kernels

Liu,

Tacchino,

Glick

et al. 2021

Preprint

View full text Add to dashboard Cite

Variational quantum circuits are used in quantum machine learning and variational quantum simulation tasks. Designing good variational circuits or predicting how well they perform for given learning or optimization tasks is still unclear. Here we discuss these problems, analyzing variational quantum circuits using the theory of neural tangent kernels. We define quantum neural tangent kernels, and derive dynamical equations for their associated loss function in optimization and learning tasks. We analytically solve the dynamics in the frozen limit, or lazy training regime, where variational angles change slowly and a linear perturbation is good enough. We extend the analysis to a dynamical setting, including quadratic corrections in the variational angles. We then consider hybrid quantum-classical architecture and define a large-width limit for hybrid kernels, showing that a hybrid quantum-classical neural network can be approximately Gaussian. The results presented here show limits for which analytical understandings of the training dynamics for variational quantum circuits, used for quantum machine learning and optimization problems, are possible. These analytical results are supported by numerical simulations of quantum machine learning experiments.

show abstract

“…This scaling allows for nonlinear feature learning, unlike the NTK scaling [8]. While there are other scalings that also admit a certain sense of feature learning [7,31], the standard parameterization in practice -in the infinite-width limit -is known to degenerate into NTK-like behaviors, which are not expected of practical finite-but-large-width neural networks [13,31]. In other words, all infinite-width scalings that display feature learning are only proxies of practical networks.…”

Section: Introductionmentioning

confidence: 99%

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

Pham¹,

Nguyen²

2021

Preprint

View full text Add to dashboard Cite

The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth. In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth. Leveraging on the neuronal embedding framework recently introduced by Nguyen and Pham [17], we systematically derive a system of dynamical equations, called the second-order mean field limit, that captures the limiting fluctuation distribution. We demonstrate through the framework the complex interaction among neurons in this second-order mean field limit, the stochasticity with cross-layer dependency and the nonlinear time evolution inherent in the limiting fluctuation. A limit theorem is proven to relate quantitatively this limit to the fluctuation realized by large-width networks. We apply the result to show a stability property of gradient descent mean field training: in the large-width regime, along the training trajectory, it progressively biases towards a solution with "minimal fluctuation" (in fact, vanishing fluctuation) in the learned output function, even after the network has been initialized at or has converged (sufficiently fast) to a global optimum. This extends a similar phenomenon previously shown only for shallow networks with a squared loss in the empirical risk minimization setting, to multilayer networks with a loss function that is not necessarily convex in a more general setting.

show abstract

Feature Learning in Infinite-Width Neural Networks

Cited by 31 publications

References 31 publications

The edge of chaos: quantum field theory and deep neural networks

The edge of chaos: quantum field theory and deep neural networks

Representation Learning via Quantum Neural Tangent Kernels

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

Contact Info

Product

Resources

About