2022
DOI: 10.48550/arxiv.2201.12052
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks

Abstract: We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 10 publications
(19 reference statements)
0
2
0
Order By: Relevance
“…For example, Chizat and Bach (2018) transform the original nonconvex minimization problem to a convex one and shows that the limit points of a suitable gradient algorithm are global minima of the convex problem. In the same spirit, Polaczyk and Cyranka (2022) show that combining over-parametrization and random initialization techniques produces almost convex and smooth objective functions that enable neural networks to avoid local minima. Du et al (2019) and Polaczyk and Cyranka (2022) also rely on over-parametrization to prove the convergence of gradient descent for feedforward and ResNet neural networks.…”
Section: Gans Training and Convexitymentioning
confidence: 89%
See 1 more Smart Citation
“…For example, Chizat and Bach (2018) transform the original nonconvex minimization problem to a convex one and shows that the limit points of a suitable gradient algorithm are global minima of the convex problem. In the same spirit, Polaczyk and Cyranka (2022) show that combining over-parametrization and random initialization techniques produces almost convex and smooth objective functions that enable neural networks to avoid local minima. Du et al (2019) and Polaczyk and Cyranka (2022) also rely on over-parametrization to prove the convergence of gradient descent for feedforward and ResNet neural networks.…”
Section: Gans Training and Convexitymentioning
confidence: 89%
“…In the same spirit, Polaczyk and Cyranka (2022) show that combining over-parametrization and random initialization techniques produces almost convex and smooth objective functions that enable neural networks to avoid local minima. Du et al (2019) and Polaczyk and Cyranka (2022) also rely on over-parametrization to prove the convergence of gradient descent for feedforward and ResNet neural networks. More generally, Ghadimi and Lan (2016) and Ghadimi and Lan (2013) extend the convergence results of gradient-based methods to a class larger than the set of convex objective functions.…”
Section: Gans Training and Convexitymentioning
confidence: 89%