Proximal Newton-Type Methods for Minimizing Composite Functions

Lee, Jason D.; Sun, Yuekai; Saunders, Michael A.

doi:10.1137/130921428

Cited by 262 publications

(384 citation statements)

References 24 publications

Supporting

Mentioning

379

Contrasting

Unclassified

Order By: Relevance

“…In the nonlinear optimization setting, the complexity of various unconstrained methods has been derived under exact derivative information [7,8,17], and also under inexact information, where the errors are bounded in a deterministic fashion [3,6,11,14,20]. In all the cases of the deterministic inexact setting, traditional optimization algorithms such as line search, trust region or adaptive regularization algorithms are applied with little modification and work in practice as well as in theory, while the error is assumed to be bounded in some decaying manner at each iteration.…”

Section: Introductionmentioning

confidence: 99%

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

2017

View full text Add to dashboard Cite

We present global convergence rates for a line-search method which is based on random first-order models and directions whose quality is ensured only with certain probability. We show that in terms of the order of the accuracy, the evaluation complexity of such a method is the same as its counterparts that use deterministic accurate models; the use of probabilistic models only increases the complexity by a constant, which depends on the probability of the models being good. We particularize and improve these results in the convex and strongly convex case.We also analyze a probabilistic cubic regularization variant that allows approximate probabilistic second-order models and show improved complexity bounds compared to probabilistic first-order methods; again, as a function of the accuracy, the probabilistic cubic regularization bounds are of the same (optimal) order as for the deterministic case.

show abstract

Section: Introductionmentioning

confidence: 99%

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

2017

View full text Add to dashboard Cite

show abstract

“…In the last few years several algorithmic frameworks for large scale composite convex optimization have been proposed. Examples include active set methods [18], stochastic methods [16], Newton type methods [17], and block coordinate descent methods [29]. In principle, all these algorithmic frameworks could be combined with the multilevel framework developed in this paper.…”

Section: Discussionmentioning

confidence: 99%

A Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems

Parpas¹

2017

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. Composite optimization models consist of the minimization of the sum of a smooth (not necessarily convex) function and a nonsmooth convex function. Such models arise in many applications where, in addition to the composite nature of the objective function, a hierarchy of models is readily available. It is common to take advantage of this hierarchy of models by first solving a low fidelity model and then using the solution as a starting point to a high fidelity model. We adopt an optimization point of view and show how to take advantage of the availability of a hierarchy of models in a consistent manner. We do not use the low fidelity model just for the computation of promising starting points but also for the computation of search directions. We establish the convergence and convergence rate of the proposed algorithm. Our numerical experiments on large scale image restoration problems and the transition path problem suggest that, for certain classes of problems, the proposed algorithm is significantly faster than the state of the art. 1. Introduction. It is often possible to exploit the structure of large scale optimization models in order to develop algorithms with lower computational complexity. We consider the case when the fidelity by which the optimization model captures the underlying application can be controlled. Typical examples include the discretization of partial differential equations in computer vision and optimal control [5], the number of features in machine learning applications [30], the number of states in a Markov decision processes [27], and nonlinear inverse problems [25]. Indeed anytime a finite dimensional optimization model arises from an infinite dimensional model it is straightforward to define such a hierarchy of optimization models. In many areas, it is common to take advantage of this structure by solving a low fidelity (coarse) model and then use the solution as a starting point in the high fidelity (fine) model. In this paper we adopt an optimization point of view and show how to take advantage of the availability of a hierarchy of models in a consistent manner. We do not use the coarse model just for the computation of promising starting points but also for the computation of search directions. We consider optimization models that consist of the sum of a smooth but not necessarily convex function and a nonsmooth convex function. These kind of problems are referred to as composite optimization models.The algorithm we propose is similar to the proximal gradient method (PGM). There is a substantial amount of literature related to proximal algorithms, and we refer the reader to [26] for a review of recent developments. The main difference between PGM and the algorithm we propose is that we use both gradient information and a coarse model in order to compute a search direction. This modification of PGM for the computation of the search direction is akin to multigrid algorithms developed recently

show abstract

“…Substituting in this inequality t 1 by (φ(x k ) − φ * ) and t 2 by (φ(x k+1 ) − φ * ) and using (14) and then (11), one has…”

Section: Propositionmentioning

confidence: 99%

“…Algorithms for minimising composite functions have been extensively investigated and found applications to many problems such as: inverse covariance estimate, logistic regression, sparse least squares and feasibility problems, see e.g. [9,14,15,19] and the references quoted therein.…”

Section: Introductionmentioning

confidence: 99%

Accelerating the DC algorithm for smooth functions

2017

View full text Add to dashboard Cite

We introduce two new algorithms to minimise smooth difference of convex (DC) functions that accelerate the convergence of the classical DC algorithm (DCA). We prove that the point computed by DCA can be used to define a descent direction for the objective function evaluated at this point. Our algorithms are based on a combination of DCA together with a line search step that uses this descent direction. Convergence of the algorithms is proved and the rate of convergence is analysed under the Łojasiewicz property of the objective function. We apply our algorithms to a class of smooth DC programs arising in the study of biochemical reaction networks, where the objective function is real analytic and thus satisfies the Łojasiewicz property. Numerical tests on various biochemical models clearly show that our algorithms outperform DCA, being on average more than four times faster in both computational time and the number of iterations. Numerical experiments show that the algorithms are globally convergent to a non-equilibrium steady state of various biochemical networks, with only chemically consistent restrictions on the network topology.

show abstract

Proximal Newton-Type Methods for Minimizing Composite Functions

Cited by 262 publications

References 24 publications

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

Global convergence rate analysis of unconstrained optimization methods based on probabilistic models

A Multilevel Proximal Gradient Algorithm for a Class of Composite Optimization Problems

Accelerating the DC algorithm for smooth functions

Contact Info

Product

Resources

About