2018
DOI: 10.1088/1361-6420/aaea2a
|View full text |Cite
|
Sign up to set email alerts
|

On the regularizing property of stochastic gradient descent

Abstract: Stochastic gradient descent (SGD) and its variants are among the most successful approaches for solving large-scale optimization problems. At each iteration, SGD employs an unbiased estimator of the full gradient computed from one single randomly selected data point. Hence, it scales well with problem size and is very attractive for handling truly massive dataset, and holds significant potentials for solving large-scale inverse problems. In this work, we rigorously establish its regularizing property under a p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

7
35
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 28 publications
(44 citation statements)
references
References 64 publications
7
35
0
Order By: Relevance
“…The stochastic gradient descent algorithm [23,36,37] is used to optimize equation (3). For better accuracy in prediction, the algorithm loops through all ratings in the training data and estimates the model parameters.…”
Section: Mathematical Modeling Of the S3d Video Recommendation Systemmentioning
confidence: 99%
“…The stochastic gradient descent algorithm [23,36,37] is used to optimize equation (3). For better accuracy in prediction, the algorithm loops through all ratings in the training data and estimates the model parameters.…”
Section: Mathematical Modeling Of the S3d Video Recommendation Systemmentioning
confidence: 99%
“…However, they do not give a rate of convergence, which remains an open problem. Numerically, we observe that the convergence rate obtained by the discrepancy principle is nearly order-optimal for low-regularity solutions, as the a priori rule in the regime in [JL19], and the performance is competitive with the standard Landweber method. Thus, the method is especially attractive for finding a low-accuracy solution.…”
Section: Proofsmentioning
confidence: 93%
“…Remark 3.2.3. The condition r < 1 is related to an apparent saturation phenomenon with SGD: for any ν > 1, the SGD iterate x δ k with a priori stopping can only achieve a convergence rate comparable with that for ν = 1 in the setting of Assumption 3.1.1, at least for the current analysis [JL19]. It remains unclear whether this is an intrinsic drawback of SGD or due to limitations of the proof technique.…”
Section: Proofsmentioning
confidence: 94%
See 2 more Smart Citations