Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD

Sun, Jianhui; Yang, Ying; Xun, Guangxu; Zhang, Aidong

doi:10.1145/3544782

Cited by 7 publications

(6 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The convolutional neural networks confuse the L 2 regularization and make minimization too difficult for SGDW. However, the two techniques can improve minimization of the loss function using projection [38] and hyperparameter methods [39]. Before introducing the projection technique for SGD, it is necessary to recall batch normalization [40].…”

Section: Sgd-type Algorithmsmentioning

confidence: 99%

Survey of Optimization Algorithms in Modern Neural Networks

2023

View full text Add to dashboard Cite

The main goal of machine learning is the creation of self-learning algorithms in many areas of human activity. It allows a replacement of a person with artificial intelligence in seeking to expand production. The theory of artificial neural networks, which have already replaced humans in many problems, remains the most well-utilized branch of machine learning. Thus, one must select appropriate neural network architectures, data processing, and advanced applied mathematics tools. A common challenge for these networks is achieving the highest accuracy in a short time. This problem is solved by modifying networks and improving data pre-processing, where accuracy increases along with training time. Bt using optimization methods, one can improve the accuracy without increasing the time. In this review, we consider all existing optimization algorithms that meet in neural networks. We present modifications of optimization algorithms of the first, second, and information-geometric order, which are related to information geometry for Fisher–Rao and Bregman metrics. These optimizers have significantly influenced the development of neural networks through geometric and probabilistic tools. We present applications of all the given optimization algorithms, considering the types of neural networks. After that, we show ways to develop optimization algorithms in further research using modern neural networks. Fractional order, bilevel, and gradient-free optimizers can replace classical gradient-based optimizers. Such approaches are induced in graph, spiking, complex-valued, quantum, and wavelet neural networks. Besides pattern recognition, time series prediction, and object detection, there are many other applications in machine learning: quantum computations, partial differential, and integrodifferential equations, and stochastic processes.

show abstract

Section: Sgd-type Algorithmsmentioning

confidence: 99%

Survey of Optimization Algorithms in Modern Neural Networks

2023

View full text Add to dashboard Cite

show abstract

“…Besides momentum and Nesterov condition, this algorithm can be equipped with L 2 -regularization (extension of weight decay), projection and hyper-parameters, which significantly increase test accuracy in various type of neural networks. These tools are still actual in other modifications of SGD, such as SGDW [11], SGDP [12] and QHM [13]. For achieving higher accuracy in every artificial neural network, SGDM with Nestrov condition is not the most appropriate approach.…”

Section: Preliminaries a Gradient Descent With Step-size Adaptationmentioning

confidence: 99%

Improving Image Recognition Accuracy in Neural Networks Using Fractional Natural Gradient Descent

Abdulkadirov¹

2023

Preprint

View full text Add to dashboard Cite

<p>This paper proposes a modified natural gradient descent, containing fractional derivatives of Riemann-Liouville, Caputo and Grunwald-Letnikov types. Such approach belongs to information-geometric optimization methods, which take into account not only directions of gradients or momentum parameters, but the convexity (curvature) of minimizing function. This technique, comparing with second order optimization algorithms, lets to increase the rate of convergence. With fractional order derivatives it is possible to adjust the descent toward the neighborhood of the global minimum. In experiments, we demonstrated the increasing accuracy of image recognition on MNIST and CIFAR10, using the proposed optimization algorithm.</p>

show abstract

“…Because the architecture of these neural networks confuses the L 2 regularization and make the process of minimization too difficult for SGDW. But there are two techniques, which can improve the quality of minimization of the loss function using projection [31] and hyper-parameter methods [32].…”

Section: Sgd-type Algorithmsmentioning

confidence: 99%

Survey of Optimization Algorithms in Modern Neural Networks

Abdulkadirov¹,

Lyakhov²,

Nagornov³

2023

Preprint

View full text Add to dashboard Cite

Creating self-learning algorithms, developing deep neural networks and improving other methods that "learn" for various areas of human activity is the main goal of the theory of machine learning. It helps to replace the human with a machine, aiming to increase the quality of production. The theory of artificial neural networks, which already have replaced the humans in problems of detection of moving objects, recognition of images or sounds, time series prediction, big data analysis and numerical methods remains the most dispersed branch of the theory of machine learning. Certainly, for each area of human activity it is necessary to select appropriate neural network architectures, methods of data processing and some novel tools from applied mathematics. But the universal problem for all these neural networks with specific data is the achieving the highest accuracy in short time. Such problem can be resolved by increasing sizes of architectures and improving data preprocessing, where the accuracy rises with the training time. But there is a possibility to increase the accuracy without time growing, applying optimization methods. In this survey we demonstrate existing optimization algorithms of all types, which can be used in neural networks. There are presented modifications of basic optimization algorithms, such as stochastic gradient descent, adaptive moment estimation, Newton and quasi-Newton optimization methods. But the most recent optimization algorithms are related to information geometry, for Fisher-Rao and Bregman metrics. This approach in optimization extended the theory of classic neural networks to quantum and complex-valued neural networks, due to geometric and probabilistic tools. There are provided applications of all introduced optimization algorithms, what delighted many kinds of neural networks, which can be improved by including any advanced approaches in minimization of the loss function. Afterwards, we demonstrated ways of developing optimization algorithms in further researches, engaging neural networks with progressive architectures. Classical gradient based optimizers can be replaced by fractional order, bilevel and, even, gradient free optimization methods. There is a possibility to add such analogues in graph, spiking, complex-valued, quantum and wavelet neural networks. Besides the usual problems of image recognition, time series prediction, object detection, there are many are other tasks for modern theory of machine learning, such as solving problem of quantum computations, partial differential and integro-differential equations, stochastic processes and Brownian motion, making decisions and computer algebra.

show abstract

Scheduling Hyperparameters to Improve Generalization: From Centralized SGD to Asynchronous SGD

Cited by 7 publications

References 35 publications

Survey of Optimization Algorithms in Modern Neural Networks

Survey of Optimization Algorithms in Modern Neural Networks

Improving Image Recognition Accuracy in Neural Networks Using Fractional Natural Gradient Descent

Survey of Optimization Algorithms in Modern Neural Networks

Contact Info

Product

Resources

About