A Stochastic Quasi-Newton Method for Large-Scale Optimization

Byrd, Richard H.; Hansen, Samantha; Nocedal, Jorge; Singer, Yoram

doi:10.1137/140954362

Cited by 328 publications

(324 citation statements)

References 16 publications

Supporting

Mentioning

323

Contrasting

Order By: Relevance

“…The quasi-Newton method proposed in [5] uses some subsampled Hessian algorithms via the sample average approximation (SAA) approach to estimate Hessian-vector multiplications. In [6], the authors proposed to use the SA approach instead of SAA to estimate the curvature information. This stochastic quasi-Newton method is based on L-BFGS [26] and performs very well in some problems arising from machine learning, but no theoretical convergence analysis was provided in [6].…”

mentioning

confidence: 99%

“…In [6], the authors proposed to use the SA approach instead of SAA to estimate the curvature information. This stochastic quasi-Newton method is based on L-BFGS [26] and performs very well in some problems arising from machine learning, but no theoretical convergence analysis was provided in [6]. Stochastic quasi-Newton methods based on BFGS and L-BFGS updates were also studied for online convex optimization in Schraudolph et al [42], with no convergence analysis provided, either.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

Wang¹,

Ma²,

Goldfarb³

et al. 2017

SIAM J. Optim.

145

187

View full text Add to dashboard Cite

Abstract. In this paper we study stochastic quasi-Newton methods for nonconvex stochastic optimization, where we assume that only stochastic information of the gradients of the objective function is available via a stochastic first-order oracle (SF O). Firstly, we propose a general framework of stochastic quasi-Newton methods for solving nonconvex stochastic optimization. The proposed framework extends the classic quasi-Newton methods working in deterministic settings to stochastic settings, and we prove its almost sure convergence to stationary points. Secondly, we propose a general framework for a class of randomized stochastic quasi-Newton methods, in which the number of iterations conducted by the algorithm is a random variable. The worst-case SF O-calls complexities of this class of methods are analyzed. Thirdly, we present two specific methods that fall into this framework, namely stochastic damped-BFGS method and stochastic cyclic Barzilai-Borwein method. Finally, we report numerical results to demonstrate the efficiency of the proposed methods.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

Wang¹,

Ma²,

Goldfarb³

et al. 2017

SIAM J. Optim.

145

187

View full text Add to dashboard Cite

show abstract

“…This and several other practical issues have been recently addressed in [2]. Finally, another class of extensions to SGD are stochastic quasiNewton methods [6,11]. Despite their clear potential, a lack of theoretical understanding and complicated implementation issues compared to those above may still limit their adoption in the wider community.…”

mentioning

confidence: 99%

Distributed optimization with arbitrary local solvers

Konečný

Jäggi

et al. 2017

Optimization Methods and Software

164

160

View full text Add to dashboard Cite

With the growth of data and necessity for distributed optimization methods, solvers that work well on a single machine must be re-designed to leverage distributed computation. Recent work in this area has been limited by focusing heavily on developing highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performance of their well-tuned and customized single machine counterparts. Further, they are unable to easily integrate improvements that continue to be made to single machine methods. To this end, we present a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods. We give strong primal-dual convergence rate guarantees for our framework that hold for arbitrary local solvers. We demonstrate the impact of local solver selection both theoretically and in an extensive experimental comparison. Finally, we provide thorough implementation details for our framework, highlighting areas for practical performance gains.Keywords: primal-dual algorithm; distributed computing; machine learning; convergence analysis 2010 Mathematics Subject Classification: 68W15; 68W20; 68W10; 68W40 MotivationRegression and classification techniques, represented in the general class of regularized loss minimization problems [71], are among the most central tools in modern big data analysis, machine learning, and signal processing. For these tasks, much effort from both industry and academia has gone into the development of highly tuned and customized solvers. However, with the massive growth of available datasets, major roadblocks still persist in the distributed setting, where data no longer fit in the memory of a single computer, and computation must be split across multiple machines in a network [3,7,12,18,22,29,32,34,37,46,52,62,64,67,78].On typical real-world systems, communicating data between machines is several orders of magnitude slower than reading data from main memory, e.g. when leveraging commodity hardware. Therefore when trying to translate existing highly tuned single machine solvers to the *Corresponding author. Email: takac.mt@gmail.com 814C. Ma et al. distributed setting, great care must be taken to avoid this significant communication bottleneck [26,74].While several distributed solvers for the problems of interest have been recently developed, they are often unable to fully leverage the competitive performance of their tuned and customized single machine counterparts, which have already received much more research attention. More importantly, it is unfortunate that distributed solvers cannot automatically benefit from improvements made to the single machine solvers, and therefore are forced to lag behind the most recent developments.In this paper, we make a step towards resolving these issues by proposing a general communication-efficient distribu...

show abstract

“…An interested reader should consults [32,34] for the initial guidance into stochastic optimization problems. Several natural extensions are easily incorporated in the framework considered in this paper, for example search directions with second order information which usually yield faster convergence, but also require additional cost, [8,9,10]. In order to decrease the linear algebra costs, one can consider preconditioners as their construction might be a nontrivial issue due to the presence of random variable.…”

Section: Discussionmentioning

confidence: 99%

Stochastic Gradient Methods for Unconstrained Optimization

Krejić

Jerinkic

2014

Pesqui. Oper.

View full text Add to dashboard Cite

ABSTRACT. This papers presents an overview of gradient based methods for minimization of noisy functions. It is assumed that the objective functions is either given with error terms of stochastic nature or given as the mathematical expectation. Such problems arise in the context of simulation based optimization. The focus of this presentation is on the gradient based Stochastic Approximation and Sample Average Approximation methods. The concept of stochastic gradient approximation of the true gradient can be successfully extended to deterministic problems. Methods of this kind are presented for the data fitting and machine learning problems.

show abstract

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Cited by 328 publications

References 16 publications

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization

Distributed optimization with arbitrary local solvers

Stochastic Gradient Methods for Unconstrained Optimization

Contact Info

Product

Resources

About