Adaptive activation functions in convolutional neural networks

Qian, Sheng; Liu, Hua; Liu, Cheng; Wu, Si; Wong, Hau-San

doi:10.1016/j.neucom.2017.06.070

Cited by 144 publications

(77 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, in this work, we are particularly focusing on adaptive activation functions, which adapt automatically such that the network can be trained faster. Various methods are proposed in the literature for adaptive activation function, like the adaptive sigmoidal activation function proposed by Yu et al [27] for multilayer feedforward NNs, while Qian et al [21] focuses on learning activation functions in convolutional NNs by combining basic activation functions in a data-driven way. Multiple activation functions per neuron are proposed by Dushkoff and Ptucha [7], where individual neurons select between a multitude of activation functions.…”

Section: Introductionmentioning

confidence: 99%

Adaptive activation functions accelerate convergence in deep and physics-informed neural networks

Jagtap

Kawaguchi

Karniadakis

2020

Journal of Computational Physics

560

212

View full text Add to dashboard Cite

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing scalable hyper-parameters in each layer (layer-wise) and for every neuron separately (neuron-wise), and then optimizing it using the stochastic gradient descent algorithm. Introduction of neuron-wise activation function acts like a vector activation function in each hidden-layer as opposed to the traditional scalar activation function given by fixed, global and layer-wise activations. In order to further increase the training speed, an activation slope based slope recovery term is added in the loss function, which further accelerate convergence, thereby reducing the training cost. For numerical experiments, a nonlinear discontinuous function is approximated using a deep neural network with layer-wise and neuron-wise locally adaptive activation functions with and without the slope recovery term and compared with its global counterpart. Moreover, solution of the nonlinear Burgers equation, which exhibits steep gradients, is also obtained using the proposed methods. On the theoretical side, we prove that in the proposed method the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate. Furthermore, the proposed adaptive activation functions with the slope recovery are shown to accelerate the training process in standard deep learning benchmarks using CIFAR-10, CIFAR-100, SVHN, MNIST, KMNIST, Fashion-MNIST, and Semeion data sets with and without data augmentation.

show abstract

Section: Introductionmentioning

confidence: 99%

Adaptive activation functions accelerate convergence in deep and physics-informed neural networks

Jagtap

Kawaguchi

Karniadakis

2020

Journal of Computational Physics

560

212

View full text Add to dashboard Cite

show abstract

“…In our case, the ReLU activation function is used inside each step of the convolutional layers until the last layer, since the ReLU clips negative values to zero while keeping positive values unchanged. This function acts as a filter that breaks up the linearity and increases the non‐linearity of the images (Qian, Liu, Liu, Wu, & SanWong, ). In the last layer, the sigmoid function is used, which is more appropriate for cases in which prediction of probability is requested as an output.…”

Section: Training the Networkmentioning

confidence: 99%

Deep learning based automated analysis of archaeo‐geophysical images

Küçükdemirci

Sarris

2020

Archaeological Prospection

View full text Add to dashboard Cite

Thanks to recent advances in deep learning (DL) and the increasing availability of large labeled/annotated datasets and trained network models, there has been impressive progress in the automated analysis of images from different scientific domains such as medicine, microbiology, astronomy and remote sensing. The automated analysis of archaeo-geophysical data is also considered important due to the large spatial extent of areas covered by landscape surveys using multi-sensor arrays driven by motorized carts and subsequently the large volume of collected data. In this work, a convolutional neural network (CNN) is built by Python 3.6 programming language using the Deep Learning Library of Keras with Tensorflow backends, a library that implements the building blocks for CNN. The network is trained from scratch adopting U-Net architecture to accomplish an automatic analysis of the archaeo-geophysical features with emphasis on ground-penetrating radar (GPR) anomalies. K E Y W O R D S archaeo-geophysics, convolutional neural networks (CNNs), deep learning, feature extraction, GPR (ground-penetrating radar), U-Net

show abstract

“…ELU (exponential linear unit) [20] is another popular activation function based on ReLU -it uses an exponential function for negative inputs instead of linear function. An adaptive ELU extension parametric ELU PELU together with mixing different activation functions using adaptive linear combination or hierarchical gated combination of activation function was shown to perform well [21].…”

Section: Adaptive Activation Functionsmentioning

confidence: 99%

On Transformative Adaptive Activation Functions in Neural Networks for Gene Expression Inference

Kunc

Kléma²

2019

Preprint

View full text Add to dashboard Cite

Motivation: Gene expression profiling was made cheaper by the NIH LINCS program that profiles only ∼1, 000 selected landmark genes and uses them to reconstruct the whole profile. The D-GEX method employs neural networks to infer the whole profile. However, the original D-GEX can be further significantly improved. Results: We have analyzed the D-GEX method and determined that the inference can be improved using a logistic sigmoid activation function instead of the hyperbolic tangent. Moreover, we propose a novel transformative adaptive activation function that improves the gene expression inference even further and which generalizes several existing adaptive activation functions. Our improved neural network achieves average mean absolute error of 0.1340 which is a significant improvement over our reimplementation of the original D-GEX which achieves average mean absolute error 0.1637

show abstract

Adaptive activation functions in convolutional neural networks

Cited by 144 publications

References 9 publications

Adaptive activation functions accelerate convergence in deep and physics-informed neural networks

Adaptive activation functions accelerate convergence in deep and physics-informed neural networks

Deep learning based automated analysis of archaeo‐geophysical images

On Transformative Adaptive Activation Functions in Neural Networks for Gene Expression Inference

Contact Info

Product

Resources

About