2020
DOI: 10.48550/arxiv.2003.01291
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

Abstract: In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

4
3

Authors

Journals

citations
Cited by 7 publications
(22 citation statements)
references
References 26 publications
0
22
0
Order By: Relevance
“…This, ( 24), (25), and the fact that for all x ∈ [a, b] d it holds that 0 (x) = 0 prove that for all φ, ψ ∈ K we have that…”
Section: Local Lipschitz Continuity Properties Of the True Risk Funct...mentioning
confidence: 79%
See 1 more Smart Citation
“…This, ( 24), (25), and the fact that for all x ∈ [a, b] d it holds that 0 (x) = 0 prove that for all φ, ψ ∈ K we have that…”
Section: Local Lipschitz Continuity Properties Of the True Risk Funct...mentioning
confidence: 79%
“…For more detailed overviews and further references on SGD optimization schemes we refer, e.g., to [8], [18,Section 1.1], [23, Section 1], and [39]. The effect of random initializations in the training of ANNs was studied, e.g., in [6,20,21,25,32,42] and the references mentioned therein. Another promising branch of research has investigated the convergence of SGD for the training of ANNs in the so-called overparametrized regime, where the number of ANN parameters has to be sufficiently large.…”
Section: Introductionmentioning
confidence: 99%
“…the number of non-zero weights and biases). Moreover, many bounds on the generalization error require an estimate of the network width [3,26].…”
Section: Approximation Of Analytic Functionsmentioning
confidence: 99%
“…Hence, in that case our results show that a SGD scheme associated with the training of the network converges almost surely on the event of staying local. Concerning the training of neural networks via SGD we refer the reader to [BM11] [JW20]. Related target functions (loss landscapes) are analysed in [Coo18], [Ngu19], [Coo20], [PRV20] and [QZX20].…”
Section: Introductionmentioning
confidence: 99%