2019
DOI: 10.48550/arxiv.1902.01843
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Global convergence of neuron birth-death dynamics

Abstract: Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions. In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the sa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(24 citation statements)
references
References 13 publications
(16 reference statements)
0
24
0
Order By: Relevance
“…In this case, it is known that convergence to global optimum is possible for gradient descent or SGD [MMN18,CB18b,RVE18], with a potentially exponential rate [JMM19] and a dimension-independent width [MMM19]. This line of works has also inspired new training algorithms [WLLM18,RJBVE19]. Most works focus on fullyconnected networks on the Euclidean space and utilize certain convexity properties.…”
Section: Further Related Workmentioning
confidence: 99%
“…In this case, it is known that convergence to global optimum is possible for gradient descent or SGD [MMN18,CB18b,RVE18], with a potentially exponential rate [JMM19] and a dimension-independent width [MMM19]. This line of works has also inspired new training algorithms [WLLM18,RJBVE19]. Most works focus on fullyconnected networks on the Euclidean space and utilize certain convexity properties.…”
Section: Further Related Workmentioning
confidence: 99%
“…The global convergence of this PDE for interaction kernels arising from single-hidden layer neural networks has been established under mild assumptions in [22,8,25]. Although the conditions for 1 To be precise, we should replace the gradient ∇L(z) with the Clarke subdifferential ∂L(z) [9], since L(z) is only piecewise smooth.…”
Section: Dynamics In the Canonical Parametersmentioning
confidence: 99%
“…global convergence hold in the mean field limit m → ∞, a propagation-of-chaos argument from statistical mechanics gives Central Limit Theorems for the behavior of finite-particle systems as fluctuations of order 1/ √ m around the mean-field solution; see [26,25] for further details.…”
Section: Dynamics In the Canonical Parametersmentioning
confidence: 99%
“…In particular, it is shown that under a suitable scaling, as the widths tend to infinity, the neural network's learning dynamics converges to a nonlinear deterministic limit, known as the mean field (MF) limit [14,17]. This line of works starts with analyses of the shallow case under various settings and has led to a number of nontrivial exciting results [18,14,5,23,25,9,22,19,29,24,30,12,1,16]. The generalization to multilayer neural networks, already much more conceptually and technically challenging, has also been met with serious efforts from different groups of authors, with various novel ideas and insights [15,17,20,2,26,6].…”
Section: Introductionmentioning
confidence: 99%