We consider global efficiency of algorithms for minimizing a sum of a convex function and a composition of a Lipschitz convex function with a smooth map. The basic algorithm we rely on is the prox-linear method, which in each iteration solves a regularized subproblem formed by linearizing the smooth map. When the subproblems are solved exactly, the method has efficiency O(ε −2 ), akin to gradient descent for smooth minimization. We show that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity O(ε −3 ). The technique readily extends to minimizing an average of m composite functions, with complexity O(m/ε 2 + √ m/ε 3 ) in expectation. We round off the paper with an inertial prox-linear method that automatically accelerates in presence of convexity.The proximal gradient algorithm, investigated by Beck-Teboulle [4] and Nesterov [54, Section 3], is a popular first-order method for additive composite minimization. Much of the current paper will center around the prox-linear method, which is a direct extension of the prox-gradient algorithm to the entire problem class (1.1). In each iteration, the prox-linear method linearizes the smooth map c(·) and solves the proximal subproblem:for an appropriately chosen parameter t > 0. The underlying assumption here is that the strongly convex proximal subproblems (1.3) can be solved efficiently. This is indeed reasonable in some circumstances. For example, one may have available specialized methods for the proximal subproblems, or interior-point points methods may be available for moderate dimensions d and m, or it may be that case that computing an accurate estimate of ∇c(x) is already the bottleneck (see e.g. Example 3.5). The prox-linear method was recently investigated in [13,23,38,53], though the ideas behind the algorithm and of its trust-region variants are much older [8,13,28,58,59,70,72]. The scheme (1.3) reduces to the popular prox-gradient algorithm for additive composite minimization, while for nonlinear least squares, the algorithm is closely related to the Gauss-Newton algorithm [55, Section 10]. Our work focuses on global efficiency estimates of numerical methods. Therefore, in line with standard assumptions in the literature, we assume that h is L-Lipschitz and the Jacobian map ∇c is β-Lipschitz. As in the analysis of the prox-gradient method in Nesterov [48,52], it is convenient to measure the progress of the prox-linear method in terms of the scaled steps, called the prox-gradients:A short argument shows that with the optimal choice t = (Lβ) −1 , the prox-linear algorithm will find a point x satisfying G 1Lβ (x) ≤ ε after at most O( Lβ ε 2 (F (x 0 ) − inf F )) iterations; see e.g. [13,23]. We mention in passing that iterate convergence under the K L-inequality was recently shown in [5,56], while local linear/quadratic rates under appropriate regularity conditions were proved in [11,23,53]. The contributions of our work are as foll...
For deterministic optimization, line-search methods augment algorithms by providing stability and improved efficiency. We adapt a classical backtracking Armijo line-search to the stochastic optimization setting. While traditional line-search relies on exact computations of the gradient and values of the objective function, our method assumes that these values are available up to some dynamically adjusted accuracy which holds with some sufficiently large, but fixed, probability. We show the expected number of iterations to reach a near stationary point matches the worst-case efficiency of typical first-order methods, while for convex and strongly convex objective, it achieves rates of deterministic gradient descent in function values.
Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.
We consider a popular nonsmooth formulation of the real phase retrieval problem. We show that under standard statistical assumptions, a simple subgradient method converges linearly when initialized within a constant relative distance of an optimal solution. Seeking to understand the distribution of the stationary points of the problem, we complete the paper by proving that as the number of Gaussian measurements increases, the stationary points converge to a codimension two set, at a controlled rate. Experiments on image recovery problems illustrate the developed algorithm and theory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.