2020
DOI: 10.1109/ojsp.2020.3039379
|View full text |Cite
|
Sign up to set email alerts
|

Learning Activation Functions in Deep (Spline) Neural Networks

Abstract: We develop an efficient computational solution to train deep neural networks (DNN) with free-form activation functions. To make the problem well-posed, we augment the cost functional of the DNN by adding an appropriate shape regularization: the sum of the second-order total-variations of the trainable nonlinearities. The representer theorem for DNNs tells us that the optimal activation functions are adaptive piecewise-linear splines, which allows us to recast the problem as a parametric optimization. The chall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(28 citation statements)
references
References 31 publications
0
28
0
Order By: Relevance
“…By plugging in k 1 = n + (1, 0), k 2 = n + (0, 1) and k 3 = n + (1, 1), we express the last term in (30) as…”
Section: B Regularizationmentioning
confidence: 99%
See 1 more Smart Citation
“…By plugging in k 1 = n + (1, 0), k 2 = n + (0, 1) and k 3 = n + (1, 1), we express the last term in (30) as…”
Section: B Regularizationmentioning
confidence: 99%
“…In this case, it has been shown that neural networks with linear spline activation functions of the form (3) are optimal [24], [25]. The link between functional approaches to neural networks and splines has also been observed in various works [26], [27], [28], [29], [30].…”
Section: Introductionmentioning
confidence: 99%
“…Bohra et al [6] presented an efficient computational solution to train deep neural networks with learnable AFs, specifically focusing on deep spline networks.…”
Section: Activation Functions: Previous Workmentioning
confidence: 99%
“…In dimension d = 1, this coincides with the known class of nonuniform linear splines which has been extensively studied from an approximation-theoretical point of view [70,71]. Motivated by this, the TV (2) regularization has been exploited to learn activation functions of deep neural networks [37,72]. In a similar vein, the identification of the sparsest CPWL solutions of TV (2) -regularized problems has been thoroughly studied in [38].…”
Section: Second-order Total-variationmentioning
confidence: 99%