Learning Activation Functions in Deep (Spline) Neural Networks

We develop a novel 2D functional learning framework that employs a sparsity-promoting regularization based on second-order derivatives. Motivated by the nature of the regularizer, we restrict the search space to the span of piecewise-linear box splines shifted on a 2D lattice. Our formulation of the infinite-dimensional problem on this search space allows us to recast it exactly as a finite-dimensional one that can be solved using standard methods in convex optimization. Since our search space is composed of continuous and piecewise-linear functions, our work presents itself as an alternative to training networks that deploy rectified linear units, which also construct models in this family. The advantages of our method are fourfold: the ability to enforce sparsity, favoring models with fewer piecewise-linear regions; the use of a rotation, scale and translation-invariant regularization; a single hyperparameter that controls the complexity of the model; and a clear model interpretability that provides a straightforward relation between the parameters and the overall learned function. We validate our framework in various experimental setups and compare it with neural networks.

show abstract

“…By plugging in k 1 = n + (1, 0), k 2 = n + (0, 1) and k 3 = n + (1, 1), we express the last term in (30) as…”

Section: B Regularizationmentioning

confidence: 99%

“…In this case, it has been shown that neural networks with linear spline activation functions of the form (3) are optimal [24], [25]. The link between functional approaches to neural networks and splines has also been observed in various works [26], [27], [28], [29], [30].…”

Section: Introductionmentioning

confidence: 99%

Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

Campos

Aziznejad

Unser

2022

IEEE Open J. Signal Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Bohra et al [6] presented an efficient computational solution to train deep neural networks with learnable AFs, specifically focusing on deep spline networks.…”

Section: Activation Functions: Previous Workmentioning

confidence: 99%

Neural Networks with À La Carte Selection of Activation Functions

Sipper

2021

SN COMPUT. SCI.

View full text Add to dashboard Cite

Activation functions (AFs), which are pivotal to the success (or failure) of a neural network, have received increased attention in recent years, with researchers seeking to design novel AFs that improve some aspect of network performance. In this paper we take another direction, wherein we combine a slew of known AFs into successful architectures, proposing three methods to do so beneficially: (1) generate AF architectures at random, (2) use Optuna, an automatic hyper-parameter optimization software framework, with a Tree-structured Parzen Estimator (TPE) sampler, and (3) use Optuna with a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) sampler. We show that all methods often produce significantly better results for 25 classification problems when compared with a standard network composed of ReLU hidden units and a softmax output unit. Optuna with the TPE sampler emerged as the best AF architecture-producing method.

show abstract

“…In dimension d = 1, this coincides with the known class of nonuniform linear splines which has been extensively studied from an approximation-theoretical point of view [70,71]. Motivated by this, the TV (2) regularization has been exploited to learn activation functions of deep neural networks [37,72]. In a similar vein, the identification of the sparsest CPWL solutions of TV (2) -regularized problems has been thoroughly studied in [38].…”

Section: Second-order Total-variationmentioning

confidence: 99%

Sparsest Univariate Learning Models Under Lipschitz Constraint

Aziznejad¹,

Debarre²,

Unser³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Beside the minimization of the prediction error, two of the most desirable properties of a regression scheme are stability and interpretability. Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the learned mapping. In our second approach, we control the Lipschitz constant explicitly using a user-defined upper-bound and make use of a sparsity-promoting regularizer to favor simpler (and, hence, more interpretable) solutions. The theoretical study of the latter formulation is motivated in part by its equivalence, which we prove, with the training of a Lipschitz-constrained two-layer univariate neural network with rectified linear unit (ReLU) activations and weight decay. By proving representer theorems, we show that both problems admit global minimizers that are continuous and piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms that find the sparsest solution of each problem: the CPWL mapping with the least number of linear regions. Finally, we illustrate numerically the outcome of our formulations.

show abstract

Learning Activation Functions in Deep (Spline) Neural Networks

Cited by 28 publications

References 31 publications

Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

Neural Networks with À La Carte Selection of Activation Functions

Sparsest Univariate Learning Models Under Lipschitz Constraint

Contact Info

Product

Resources

About