2021
DOI: 10.48550/arxiv.2104.00277
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Arnulf Jentzen,
Adrian Riekert

Abstract: In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d ∈ N neurons on the input layer, H ∈ N … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…Note that ( 18) and the assumption that v ∈ R d+1 \{0} imply that v d+1 = 0. Moreover, observe that (18) shows that for all u = (u 1 , . .…”
Section: Continuous Dependence Of Active Neuron Regions On Ann Parame...mentioning
confidence: 99%
See 3 more Smart Citations
“…Note that ( 18) and the assumption that v ∈ R d+1 \{0} imply that v d+1 = 0. Moreover, observe that (18) shows that for all u = (u 1 , . .…”
Section: Continuous Dependence Of Active Neuron Regions On Ann Parame...mentioning
confidence: 99%
“…and [18,Proposition 2.3]) establishes items (i), (ii), (iii), and (iv). The proof of Proposition 2.2 is thus complete.…”
Section: Mathematical Description Of Artificial Neural Network (Annsmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, non-global local minimum points could be found in the risk landscape of ANNs with one hidden layer and ReLU activation in special student-teacher setups with the probability distribution of the input data given by the normal distribution (see Safran & Shamir [31]). In other cases, where the target function has a very simple form, the critical points of the risk landscape are fully characterized and thus all local minimum points are known (see Cheridito et al [2,Corollary 2.15], Cheridito et al [3], and Jentzen & Riekert [17,Corollary 2.11]). Additionally, in the case of ANNs with linear activation and finitely many training data it was shown that all local minimum points of the risk function corresponding to the squared error loss are global minimum points (cf.…”
Section: Introductionmentioning
confidence: 99%