Global convergence of neuron birth-death dynamics

Rotskoff, Grant M.; Jelassi, Samy; Bruna, Joan; Vanden‐Eijnden, Eric

doi:10.48550/arxiv.1902.01843

Cited by 17 publications

(24 citation statements)

References 13 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this case, it is known that convergence to global optimum is possible for gradient descent or SGD [MMN18,CB18b,RVE18], with a potentially exponential rate [JMM19] and a dimension-independent width [MMM19]. This line of works has also inspired new training algorithms [WLLM18,RJBVE19]. Most works focus on fullyconnected networks on the Euclidean space and utilize certain convexity properties.…”

Section: Further Related Workmentioning

confidence: 99%

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

Nguyen¹,

Pham²

2020

Preprint

View full text Add to dashboard Cite

We develop a mathematically rigorous framework for multilayer neural networks in the mean field regime. As the network's width increases, the network's learning trajectory is shown to be well captured by a meaningful and dynamically nonlinear limit (the mean field limit), which is characterized by a system of ODEs. Our framework applies to a broad range of network architectures, learning dynamics and network initializations. Central to the framework is the new idea of a neuronal embedding, which comprises of a non-evolving probability space that allows to embed neural networks of arbitrary widths.We demonstrate two applications of our framework. Firstly the framework gives a principled way to study the simplifying effects that independent and identically distributed initializations have on the mean field limit. Secondly we prove a global convergence guarantee for two-layer and threelayer networks. Unlike previous works that rely on convexity, our result requires a certain universal approximation property, which is a distinctive feature of infinite-width neural networks. To the best of our knowledge, this is the first time global convergence is established for neural networks of more than two layers in the mean field regime.

show abstract

Section: Further Related Workmentioning

confidence: 99%

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

Nguyen¹,

Pham²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The global convergence of this PDE for interaction kernels arising from single-hidden layer neural networks has been established under mild assumptions in [22,8,25]. Although the conditions for 1 To be precise, we should replace the gradient ∇L(z) with the Clarke subdifferential ∂L(z) [9], since L(z) is only piecewise smooth.…”

Section: Dynamics In the Canonical Parametersmentioning

confidence: 99%

“…global convergence hold in the mean field limit m → ∞, a propagation-of-chaos argument from statistical mechanics gives Central Limit Theorems for the behavior of finite-particle systems as fluctuations of order 1/ √ m around the mean-field solution; see [26,25] for further details.…”

Section: Dynamics In the Canonical Parametersmentioning

confidence: 99%

Gradient Dynamics of Shallow Univariate ReLU Networks

Williams,

Trager,

Silva

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

We present a theoretical and empirical study of the gradient dynamics of overparameterized shallow ReLU networks with one-dimensional input, solving least-squares interpolation. We show that the gradient dynamics of such networks are determined by the gradient flow in a non-redundant parameterization of the network function. We examine the principal qualitative features of this gradient flow. In particular, we determine conditions for two learning regimes: kernel and adaptive, which depend both on the relative magnitude of initialization of weights in different layers and the asymptotic behavior of initialization coefficients in the limit of large network widths. We show that learning in the kernel regime yields smooth interpolants, minimizing curvature, and reduces to cubic splines for uniform initializations. Learning in the adaptive regime favors instead linear splines, where knots cluster adaptively at the sample points. * Equal contribution.Preprint. Under review.

show abstract

“…In particular, it is shown that under a suitable scaling, as the widths tend to infinity, the neural network's learning dynamics converges to a nonlinear deterministic limit, known as the mean field (MF) limit [14,17]. This line of works starts with analyses of the shallow case under various settings and has led to a number of nontrivial exciting results [18,14,5,23,25,9,22,19,29,24,30,12,1,16]. The generalization to multilayer neural networks, already much more conceptually and technically challenging, has also been met with serious efforts from different groups of authors, with various novel ideas and insights [15,17,20,2,26,6].…”

Section: Introductionmentioning

confidence: 99%

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

Pham¹,

Nguyen²

2021

Preprint

View full text Add to dashboard Cite

The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth. In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth. Leveraging on the neuronal embedding framework recently introduced by Nguyen and Pham [17], we systematically derive a system of dynamical equations, called the second-order mean field limit, that captures the limiting fluctuation distribution. We demonstrate through the framework the complex interaction among neurons in this second-order mean field limit, the stochasticity with cross-layer dependency and the nonlinear time evolution inherent in the limiting fluctuation. A limit theorem is proven to relate quantitatively this limit to the fluctuation realized by large-width networks. We apply the result to show a stability property of gradient descent mean field training: in the large-width regime, along the training trajectory, it progressively biases towards a solution with "minimal fluctuation" (in fact, vanishing fluctuation) in the learned output function, even after the network has been initialized at or has converged (sufficiently fast) to a global optimum. This extends a similar phenomenon previously shown only for shallow networks with a squared loss in the empirical risk minimization setting, to multilayer networks with a loss function that is not necessarily convex in a more general setting.

show abstract

Global convergence of neuron birth-death dynamics

Cited by 17 publications

References 13 publications

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks

Gradient Dynamics of Shallow Univariate ReLU Networks

Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training

Contact Info

Product

Resources

About