We lower bound the complexity of finding -stationary points (with gradient norm at most ) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least −4 queries to find an stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a mean-squared smoothness property, we prove a lower bound of −3 queries, establishing the optimality of recently proposed variance reduction techniques.
Second-order methods, which utilize gradients as well as Hessians to optimize a given function, are of major importance in mathematical optimization. In this work, we prove tight bounds on the oracle complexity of such methods for smooth convex functions, or equivalently, the worst-case number of iterations required to optimize such functions to a given accuracy. In particular, these bounds indicate when such methods can or cannot improve on gradient-based methods, whose oracle complexity is much better understood. We also provide generalizations of our results to higher-order methods.• Perhaps unexpectedly, Eq. (5) establishes that one cannot avoid in general a polynomial dependence on geometry-dependent "condition numbers" of the form µ 1 /λ or µ 2 D/λ, even with second-order methods. This is despite the ability of such methods to favorably alter the geometry of the problem (for example, the Newton method is well-known to be affine invariant).• To improve on the oracle complexity of first-order methods for strongly-convex problems (Eq. (3)) by more than logarithmic factors, one cannot avoid a polynomial dependence on the initial distance D to the optimum. This is despite the fact that the dependence on D with first-order methods is only logarithmic. In fact, when D is sufficiently large (of order µ 7/4 1 µ 2 λ 3/4 or larger), second-order methods cannot improve on the oracle complexity of first-order methods by more than logarithmic factors. 1 Assuming f is twice-differentiable, this corresponds to ∇ 2 f (w) λI uniformly for all w.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.