Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

Zhao, Renbo; Haskell, William B.; Tan, Vincent Y. F.

doi:10.48550/arxiv.1704.00116

Cited by 1 publication

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Parallel Asy VR High dimensional Convergence [22] sublinear [25] linear [23] linear [26] linear [29] parallel two-loop recursion − [30] map reduce for gradient sublinear [31] map reduce for gradient linear [32,33] parallel calculation for gradient − [34] parallel calculation for Hessian superlinear AsySQN parallel model for L-BFGS linear sion can be calculated fast. As a successful trial to create both stochastic and parallel algorithms, multi-batch L-BFGS [30] uses map-reduce to compute both gradients and updating rules for L-BFGS.…”

Section: Qn Methods Stochasticmentioning

confidence: 99%

“…Using the variance reduction (VR) technique proposed in [24], the convergence rate can be lifted up to linear in the latest attempts [25,23]. Later, acceleration strategies [26] are combined with VR, non-uniform mini-batch subsampling, momentum calculation to derive a fast and practical stochastic algorithm. Another line of stochastic quasi-Newton studies tries to focus on solving self-concordant functions, which requires more on the shape or property of objective functions, can reach a linear convergence rate.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Asynchronous parallel stochastic Quasi-Newton methods

Tong

Liang

Cai³

et al. 2021

Parallel Computing

View full text Add to dashboard Cite

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

show abstract