Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions

Lei, Yunwen; Hu, Ting; Li, Guiying; Tang, Ke

doi:10.1109/tnnls.2019.2952219

Cited by 73 publications

(59 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our goal here is to move towards a more general theory of convergence that combines all of these threads under a single analysis framework. Specifically, by innovating on the strategies of Lei et al [2019] and Patel [2020], we will prove the following results for SGD with matrix-valued learning rates, which we state informally now and formalize later.…”

Section: Contributionsmentioning

confidence: 98%

“…with α ∈ (0, 1], the objective function, F , evaluated at the SGD iterates converges almost surely to a bounded random variable. Moreover, Lei et al [2019] show that, when α = 1, the expected value of the norm of the gradient function, Ḟ , evaluated at the SGD iterates converges to zero. 3.…”

Section: Introductionmentioning

confidence: 96%

“…1 Owing to its broad usage, SGD's global behavior on different classes of functions f (and, hence, F ) has been of substantial interest. While there are many works that have provided insight, understanding SGD's global behavior has been notably advanced by several recent works [Asi and Duchi, 2019, Lei et al, 2019, Patel, 2020, Khaled and Richtárik, 2020 that we overview presently.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Stochastic Gradient Descent on Nonconvex Functions with General Noise Models

Patel¹,

Zhang²

2021

Preprint

View full text Add to dashboard Cite

Stochastic Gradient Descent (SGD) is a widely deployed optimization procedure throughout data-driven and simulation-driven disciplines, which has drawn a substantial interest in understanding its global behavior across a broad class of nonconvex problems and noise models. Recent analyses of SGD have made noteworthy progress in this direction, and these analyses have innovated important and insightful new strategies for understanding SGD. However, these analyses often have imposed certain restrictions (e.g., convexity, global Lipschitz continuity, uniform Hölder continuity, expected smoothness, etc.) that leave room for innovation. In this work, we address this gap by proving that, for a rather general class of nonconvex functions and noise models, SGD's iterates either diverge to infinity or converge to a stationary point with probability one. By further restricting to globally Hölder continuous functions and the expected smoothness noise model, we prove that-regardless of whether the iterates diverge or remain finite-the norm of the gradient function evaluated at SGD's iterates converges to zero with probability one and in expectation. As a result of our work, we broaden the scope of nonconvex problems and noise models to which SGD can be applied with rigorous guarantees of its global behavior.Preprint. Under review.

show abstract

Section: Contributionsmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Stochastic Gradient Descent on Nonconvex Functions with General Noise Models

Patel¹,

Zhang²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It is usually solved in practice using toolboxes as Scikit-Learn or TensorFlow, by means of a stochastic gradient descent method for which a full convergence proof under realistic assumptions are still unknown in our knowledge. See Bottou et al [2018] or E et al [2020] for recent in depth reviews of these subjects and Ghadimi and Lan [2013], Lei et al [2019], Fehrman et al [2020] for results for a non convex function.…”

Section: Description Of the Algorithmmentioning

confidence: 99%

Neural network regression for Bermudan option pricing

Lapeyre

Lelong

2021

Monte Carlo Methods and Applications

View full text Add to dashboard Cite

The pricing of Bermudan options amounts to solving a dynamic programming principle, in which the main difficulty, especially in high dimension, comes from the conditional expectation involved in the computation of the continuation value. These conditional expectations are classically computed by regression techniques on a finite-dimensional vector space. In this work, we study neural networks approximations of conditional expectations. We prove the convergence of the well-known Longstaff and Schwartz algorithm when the standard least-square regression is replaced by a neural network approximation, assuming an efficient algorithm to compute this approximation. We illustrate the numerical efficiency of neural networks as an alternative to standard regression methods for approximating conditional expectations on several numerical examples.

show abstract

“…There are four major approaches to handles nonconvex optimization problems. The first groups of methods is based on stochastic gradient descent and its convergence rates are far from other methods as is described in (Lei et al 2020). The second approach is to see noncovex functions as a difference of some convex functions but this framework does not work for any class of nonconvex problems.…”

Section: Introductionmentioning

confidence: 99%

Sparse Non-Convex Optimization For Higher Moment Portfolio Management

Noravesh¹

2022

Preprint

View full text Add to dashboard Cite

One of the reasons that higher order moment portfolio optimization methods are not fully used by practitioners in investment decisions is the complexity that these higher moments create by making the optimization problem nonconvex. Many few methods and theoretical results exists in the literature, but the present paper uses the method of successive convex approximation for the mean-variance-skewness problem.

show abstract

Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions

Cited by 73 publications

References 19 publications

Stochastic Gradient Descent on Nonconvex Functions with General Noise Models

Stochastic Gradient Descent on Nonconvex Functions with General Noise Models

Neural network regression for Bermudan option pricing

Sparse Non-Convex Optimization For Higher Moment Portfolio Management

Contact Info

Product

Resources

About