In this paper, we provide new results and algorithms (including backtracking versions of Nesterov accelerated gradient and Momentum) which are more applicable to large scale optimisation as in Deep Neural Networks. We also demonstrate that Backtracking Gradient Descent (Backtracking GD) can obtain good upper bound estimates for local Lipschitz constants for the gradient, and that the convergence rate of Backtracking GD is similar to that in classical work of Armijo. Experiments with datasets CIFAR10 and CIFAR100 on various popular architectures verify a heuristic argument that Backtracking GD stabilises to a finite union of sequences constructed from Standard GD for the mini-batch practice, and show that our new algorithms (while automatically fine tuning learning rates) perform better than current state-of-the-art methods such as Adam, Adagrad, Adadelta, RMSProp, Momentum and Nesterov accelerated gradient. To help readers avoiding the confusion between heuristics and more rigorously justified algorithms, we also provide a review of the current state of convergence results for gradient descent methods. Accompanying source codes are available on GitHub.
We propose in this paper New Q-Newton’s method. The update rule is conceptually very simple, using the projections to the vector subspaces generated by eigenvectors of positive (correspondingly negative) eigenvalues of the Hessian. The main result of this paper roughly says that if a sequence $$\{x_n\}$$ { x n } constructed by the method from a random initial point $$x_0$$ x 0 converges, then the limit point is a critical point and not a saddle point, and the convergence rate is the same as that of Newton’s method. A subsequent work has recently been successful incorporating Backtracking line search to New Q-Newton’s method, thus resolving the global convergence issue observed for some (non-smooth) functions. An application to quickly find zeros of a univariate meromorphic function is discussed, accompanied with an illustration on basins of attraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.