Stochastic Subgradient Method Converges on Tame Functions

Davis, Damek; Drusvyatskiy, Dmitriy; Kakade, Sham M.; Lee, Jason D.

doi:10.1007/s10208-018-09409-5

Cited by 165 publications

(269 citation statements)

References 40 publications

(84 reference statements)

Supporting

Mentioning

263

Contrasting

Unclassified

Order By: Relevance

“…Note also that virtually all deep network architectures used in applications are actually definable, see e.g. [24]. (b) Despite our efforts we do not see any means to obtain Corollary 6 easily.…”

Section: Deep Neural Network and Nonsmooth Backpropagationmentioning

confidence: 98%

“…Proof : Using the chain rule characterization, all proofs boil down to providing a chain rule with the Clarke subdifferential for each of the above mentioned situation. We refer to [48] for convex, Clarke and prox regular functions, [24] for tame functions.…”

Section: Corollary 3 (Integrability and Clarke Subdifferential)mentioning

confidence: 99%

“…On a more applied side, conservative fields allow to analyze fundamental modern numerical algorithms in machine learning or numerical analysis based on automatic differentiation [49,28] and decomposition [17,24] in a nonsmooth context. Automatic differentiation is indeed proved to yield conservative fields which allows in turn to study discrete stochastic algorithms that are massively used to train AI systems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning

2020

View full text Add to dashboard Cite

Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus. We introduce generalized derivatives called conservative fields for which we develop a calculus and provide representation formulas. Functions having a conservative field are called path differentiable: convex, concave, Clarke regular and any semialgebraic Lipschitz continuous functions are path differentiable. Using Whitney stratification techniques for semialgebraic and definable sets, our model provides variational formulas for nonsmooth automatic differentiation oracles, as for instance the famous backpropagation algorithm in deep learning. Our differential model is applied to establish the convergence in values of nonsmooth stochastic gradient methods as they are implemented in practice.

show abstract

Section: Deep Neural Network and Nonsmooth Backpropagationmentioning

confidence: 98%

Section: Corollary 3 (Integrability and Clarke Subdifferential)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning

2020

View full text Add to dashboard Cite

show abstract

“…Our current work sits within the broader scope of analyzing subgradient and proximal methods for weakly convex problems [9, 11, 13-16, 25, 26]; see also the recent survey [12]. In particular, the paper [9] proves a global sublinear rate of convergence, in terms of a natural stationarity measure, of a (stochastic) subgradient method on any weakly convex function. In contrast, here we are interested in subgradient methods that are locally linearly convergent under the additional sharpness assumption.…”

Section: Introductionmentioning

confidence: 99%

Subgradient Methods for Sharp Weakly Convex Functions

Davis

Drusvyatskiy

MacPhee

et al. 2018

J Optim Theory Appl

Self Cite

View full text Add to dashboard Cite

Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.

show abstract

“…Thus, one might hope to prove convergence results for a GS algorithm (with predetermined stepsizes rather than line searches) that parallel convergence theory for stochastic gradient methods. Recent work by Davis, Drusvyatskiy, Kakade and Lee [DDKL18] gives convergence results for stochastic subgradient methods on a broad class of problems.…”

Section: Discussionmentioning

confidence: 99%

Gradient Sampling Methods for Nonsmooth Optimization

Burke

Curtis

Lewis

et al. 2020

Numerical Nonsmooth Optimization

View full text Add to dashboard Cite

This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide overviews of various enhancements that have been proposed to improve practical performance, as well as of several extensions that have been made in the literature, such as to solve constrained problems. The paper also includes clarification of certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.

show abstract

Stochastic Subgradient Method Converges on Tame Functions

Cited by 165 publications

References 40 publications

Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning

Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning

Subgradient Methods for Sharp Weakly Convex Functions

Gradient Sampling Methods for Nonsmooth Optimization

Contact Info

Product

Resources

About