2019
DOI: 10.48550/arxiv.1906.09529
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Activation Functions: A new paradigm for understanding Neural Networks

Abstract: The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep learning, it becomes important to look at the non linear component of NNs more carefully. In this paper, we aim to provide a generic form of activation function along with appropriate mathematical grounding so as to allow for insights into the working of NNs in future. We propose … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…The Convolutional filters are applied to the input through the convolutional layers of CNN to compute the output of the neurons connected to specific regions in the input. It helps to extract the temporal and spatial features from an image (Goyal et al, 2019).…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The Convolutional filters are applied to the input through the convolutional layers of CNN to compute the output of the neurons connected to specific regions in the input. It helps to extract the temporal and spatial features from an image (Goyal et al, 2019).…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…The max and average are the two popular methods of the max-pooling layer. The fully connected layer with the 512 unit is used to classify the image into different classes (Goyal et al, 2019;Kang et al, 2021). For feature map normalization, the batch normalization layer is used.…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…ω for all l, φ 1 (x) = max{x, 0}, and φ 2 (x) = (e x − 1) • I x≤0 (x), the Kronecker network becomes a FF network with Exponential Linear Unit (ELU) activation [7] if ω l 1 = 1 for all l, and becomes a FF network with Scaled Exponential Linear Unit (SELU) activation [24] if ω l 1 = ω for all l. • If K = 1, the Kronecker network becomes a feed-forward neural network with layerwise locally adaptive activation functions [20,21]. • If ω l = 1 for all l and φ k (x) = x k−1 for all k, the Kronecker network becomes a feedforward neural network with self-learnable activation functions (SLAF) [11]. Similarly, a FFN with smooth adaptive activation function [17] can be represented by a Kronecker network.…”
Section: Mathematical Setup and Kronecker Neural Networkmentioning
confidence: 99%
“…However, there is no rule of thumb of choosing an optimal activation function. This has motivated the use of adaptive activation functions by our group and others, see [1,17,20,20,21,53,11,45], with varying results demonstrating superior performance over non-adaptive fixed activation functions in various learning tasks.…”
mentioning
confidence: 99%
“…While smooth activation functions such as sigmoid, logistic, or hyperbolic tangent are widely used in machine learning, they suffer from the "vanishing gradient problem" [6] because their derivatives are zero for large inputs. Neural networks based on polynomial activation functions are an alternative [10,12,20,21,37,57], but can be numerically unstable due to large gradients for large inputs [6]. Moreover, polynomials do not approximate non-smooth functions efficiently [56], which can yield optimization issues in classification problems.…”
Section: Introductionmentioning
confidence: 99%