We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch.
We prove lower bounds on the complexity of finding -stationary points (points x such that ∇f (x) ≤ ) of smooth, high-dimensional, and potentially non-convex functions f . We consider oracle-based complexity measures, where an algorithm is given access to the value and all derivatives of f at a query point x. We show that for any (potentially randomized) algorithm A, there exists a function f with Lipschitz pth order derivatives such that A requires at least −(p+1)/p queries to find an -stationary point. Our lower bounds are sharp to within constants, and they show that gradient descent, cubic-regularized Newton's method, and generalized pth order regularization are worst-case optimal within their natural function classes.
We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time O( −7/4 log(1/ )) to find an -stationary point, meaning a point x such that ∇f (x) ≤ . The method improves upon the O( −2 ) complexity of gradient descent and provides the additional second-order guarantee that ∇ 2 f (x) −O( 1/2 )I for the computed x. Furthermore, our method is Hessian-free, i.e. it only requires gradient computations, and is therefore suitable for large scale applications.1 The notation O hides logarithmic factors. See Definition 5. 2 Technically, O(d ω ) where ω < 2.373 is the matrix multiplication constant [42].
We establish lower bounds on the complexity of finding -stationary points of smooth, nonconvex high-dimensional functions using first-order methods. We prove that deterministic firstorder methods, even applied to arbitrarily smooth functions, cannot achieve convergence rates in better than −8/5 , which is within −1/15 log 1 of the best known rate for such methods. Moreover, for functions with Lipschitz first and second derivatives, we prove no deterministic first-order method can achieve convergence rates better than −12/7 , while −2 is a lower bound for functions with only Lipschitz gradient. For convex functions with Lipschitz gradient, accelerated gradient descent achieves the rate −1 log 1 , showing that finding stationary points is easier given convexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.