AdaInject: Injection-Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks

Dubey, Shiv Ram; Basha, S. H. Shabbeer; Singh, Satish Kumar; Chaudhuri, B. B.

doi:10.1109/tai.2022.3208223

Cited by 8 publications

(8 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As detailed in the literature (e.g. [15]), an ideal parameter optimization method should follow the rules depicted in Figure 4, where Θ is the parameters tensor, delta is the difference of the parameters between two training iterations (see Eq. 21), g are the gradients.…”

Section: Discussion On Adam Variantsmentioning

confidence: 99%

See 1 more Smart Citation

Building Ensemble of Deep Networks: Convolutional Networks and Transformers

Nanni,

Loreggia,

Barcellona

et al. 2023

IEEE Access

View full text Add to dashboard Cite

This paper presents a study on an automated system for image classification, which is based on the fusion of various deep learning methods. The study explores how to create an ensemble of different Convolutional Neural Network (CNN) models and transformer topologies that are fine-tuned on several datasets to leverage their diversity. The research question addressed in this work is whether different optimization algorithms can help in developing robust and efficient machine learning systems to be used in different domains for classification purposes. To do that, we introduce new Adam variant approaches. We employed these new approaches, coupled with several CNN topologies, for building an ensemble of classifiers that outperform both other Adam-based methods and stochastic gradient descent. Additionally, the study combines the ensemble of CNNs with an ensemble of transformers based on different topologies, such as Deit, Vit, Swin, and Coat. To the best of our knowledge, this is the first work in which an indepth study of a set of transformers and convolutional neural networks in a large set of small/medium-sized images is carried out. The experiments conducted on several datasets demonstrate that the combination of such different models results in a substantial performance improvement in all tested problems. All resources are available at https://github.com/LorisNanni.

show abstract

Section: Discussion On Adam Variantsmentioning

confidence: 99%

“…Angular Injection (AI) optimizer is based on AngularGrad [39] and injection [15]. It generates a score to control the step size based on the gradient angular information of previous iterations.…”

Section: B: Mind Optimizermentioning

confidence: 99%

Building Ensemble of Deep Networks: Convolutional Networks and Transformers

Nanni,

Loreggia,

Barcellona

et al. 2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…RAdam [21] computes the variance of the adaptive learning rate and uses it to provide stability in the training process through corrections in the learning formula. AdaInject [22] controls parameter updates to minimize oscillations closer to the minimum. Recently, there has been research on PNM [23] and AdaPNM [23], aiming to replace the traditional momentum with the Positive-Negative momentum approach.…”

Section: Related Work a Sgd-based Optimization Methodsmentioning

confidence: 99%

Splitting of Composite Neural Networks via Proximal Operator With Information Bottleneck

Han,

Nakamura,

Hong

2024

IEEE Access

View full text Add to dashboard Cite

Deep learning has achieved efficient success in the field of machine learning, made possible by the emergence of efficient optimization methods such as Stochastic Gradient Descent (SGD) and its variants. Simultaneously, the Information Bottleneck theory (IB) has been studied to train neural networks, aiming to enhance the performance of optimization methods. However, previous works have focused on their specific tasks, and the effect of the IB theory on general deep learning tasks is still unclear. In this study, we introduce a new method inspired by the proximal operator, which sequentially updates the neural network parameters based on the defined bottleneck features between the forward and backward networks. Unlike the conventional proximal-based methods, we consider the second-order gradients of the objective function to achieve better updates for the forward networks. In contrast to SGD-based methods, our approach involves accessing the network's black box, and incorporating the bottleneck feature update process into the parameter update process. This way, from the perspective of the IB theory, the data is well compressed up to the bottleneck feature, ensuring that the compressed information maintains sufficient mutual information up to the final output. To demonstrate the performance of the proposed approach, we applied the method to various optimizers with several tasks and analyzed the results by training on both the MNIST dataset and CIFAR-10 dataset. We also conducted several ablation studies by modifying the components of the proposed algorithm to further validate its performance.

show abstract

“…There were made experiments of minimization of Rastrigin and Rosenbrock functions (https://github.com/jettify/pytorchoptimizer), where AdaBound achieved the highest accuracy, converging in the neighborhood of global minimum. But such approach is too complex for optimization and there exists much simple method, which is called AdamInject [64]. It reduces time consumption, preserving convergence rate.…”

Section: Adam-type Algorithmsmentioning

confidence: 99%

Survey of Optimization Algorithms in Modern Neural Networks

Abdulkadirov¹,

Lyakhov²,

Nagornov³

2023

Preprint

View full text Add to dashboard Cite

Creating self-learning algorithms, developing deep neural networks and improving other methods that "learn" for various areas of human activity is the main goal of the theory of machine learning. It helps to replace the human with a machine, aiming to increase the quality of production. The theory of artificial neural networks, which already have replaced the humans in problems of detection of moving objects, recognition of images or sounds, time series prediction, big data analysis and numerical methods remains the most dispersed branch of the theory of machine learning. Certainly, for each area of human activity it is necessary to select appropriate neural network architectures, methods of data processing and some novel tools from applied mathematics. But the universal problem for all these neural networks with specific data is the achieving the highest accuracy in short time. Such problem can be resolved by increasing sizes of architectures and improving data preprocessing, where the accuracy rises with the training time. But there is a possibility to increase the accuracy without time growing, applying optimization methods. In this survey we demonstrate existing optimization algorithms of all types, which can be used in neural networks. There are presented modifications of basic optimization algorithms, such as stochastic gradient descent, adaptive moment estimation, Newton and quasi-Newton optimization methods. But the most recent optimization algorithms are related to information geometry, for Fisher-Rao and Bregman metrics. This approach in optimization extended the theory of classic neural networks to quantum and complex-valued neural networks, due to geometric and probabilistic tools. There are provided applications of all introduced optimization algorithms, what delighted many kinds of neural networks, which can be improved by including any advanced approaches in minimization of the loss function. Afterwards, we demonstrated ways of developing optimization algorithms in further researches, engaging neural networks with progressive architectures. Classical gradient based optimizers can be replaced by fractional order, bilevel and, even, gradient free optimization methods. There is a possibility to add such analogues in graph, spiking, complex-valued, quantum and wavelet neural networks. Besides the usual problems of image recognition, time series prediction, object detection, there are many are other tasks for modern theory of machine learning, such as solving problem of quantum computations, partial differential and integro-differential equations, stochastic processes and Brownian motion, making decisions and computer algebra.

show abstract

AdaInject: Injection-Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks

Cited by 8 publications

References 29 publications

Building Ensemble of Deep Networks: Convolutional Networks and Transformers

Building Ensemble of Deep Networks: Convolutional Networks and Transformers

Splitting of Composite Neural Networks via Proximal Operator With Information Bottleneck

Survey of Optimization Algorithms in Modern Neural Networks

Contact Info

Product

Resources

About