2021
DOI: 10.48550/arxiv.2107.04479
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

Arnulf Jentzen,
Adrian Riekert

Abstract: Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF diffe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 11 publications
0
11
0
Order By: Relevance
“…As in [25, Setting 2.1 and Proposition 2.3] (cf. also [10,24]), we accomplish this, first, by approximating the ReLU activation function through continuously differentiable functions which converge pointwise to the ReLU activation function and whose derivatives converge pointwise to the left derivative of the ReLU activation function and, thereafter, by specifying the generalized gradient function as the limit of the gradients of the approximated risk functions; see (1.1) and (1.3) in Theorem 1.1 and (1.7) and (1.8) in Theorem 1.2 for details.…”
Section: Introductionmentioning
confidence: 99%
“…As in [25, Setting 2.1 and Proposition 2.3] (cf. also [10,24]), we accomplish this, first, by approximating the ReLU activation function through continuously differentiable functions which converge pointwise to the ReLU activation function and whose derivatives converge pointwise to the left derivative of the ReLU activation function and, thereafter, by specifying the generalized gradient function as the limit of the gradients of the approximated risk functions; see (1.1) and (1.3) in Theorem 1.1 and (1.7) and (1.8) in Theorem 1.2 for details.…”
Section: Introductionmentioning
confidence: 99%
“…, of zeros of the generalized gradient function G : R 4 → R 4 . In Corollary 5.9 in Subsection 5.6 we extend Corollary 5.8 by using [19,Item (v) in Theorem 1.1] and [11,Theorem 1.2] to establish that in the training of such ANNs we have that the risk of every non-divergent GF trajectory converges to the risk of a global minimum point provided that the initial risk is sufficiently small. The remainder of this section is organized in the following way.…”
Section: On Finitely Many Realization Functions Of Critical Pointsmentioning
confidence: 89%
“…Finally, in Section 5 we prove in Corollary 5.8 in the special situation where the target function is continuous and piecewise polynomial and where both the input layer and hidden layer of the considered ANNs are one-dimensional that there exist only finitely many different realization functions of all criticial points of the risk function (of all zeros of the generalized gradient function). Theorem 1.2 above can then be shown by combining Proposition 2.12, Corollary 5.8, [19,Item (v) in Theorem 1.1], and [11,Theorem 1.2]. This is precisely the subject of Corollary 5.9 in Section 5.…”
Section: Introductionmentioning
confidence: 84%
See 2 more Smart Citations