2019
DOI: 10.1137/17m1147846
|View full text |Cite
|
Sign up to set email alerts
|

Convergence Rate of Incremental Gradient and Incremental Newton Methods

Abstract: The incremental gradient method is a prominent algorithm for minimizing a finite sum of smooth convex functions, used in many contexts including large-scale data processing applications and distributed optimization over networks. It is a first-order method that processes the functions one at a time based on their gradient information. The incremental Newton method, on the other hand, is a second-order variant which exploits additionally the curvature information of the underlying functions and can therefore be… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
34
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 24 publications
(35 citation statements)
references
References 39 publications
(42 reference statements)
1
34
0
Order By: Relevance
“…It analytically showed that, for strongly convex objective functions, the convergence rate under random reshuffling can be improved from O(1/i) in vanilla SGD [25] to O(1/i 2 ). The incremental gradient methods [26], [27], which can be viewed as the deterministic version of random reshuffling, shares similar conclusions, i.e., random reshuffling helps accelerate the convergence rate from O(1/i) to O(1/i 2 ) under decaying step-sizes. Also, in the work [24], it establishes that random reshuffling will not degrade performance relative to the stochastic gradient descent implementation, provided the number of epochs is not too large.…”
Section: Motivationmentioning
confidence: 78%
See 3 more Smart Citations
“…It analytically showed that, for strongly convex objective functions, the convergence rate under random reshuffling can be improved from O(1/i) in vanilla SGD [25] to O(1/i 2 ). The incremental gradient methods [26], [27], which can be viewed as the deterministic version of random reshuffling, shares similar conclusions, i.e., random reshuffling helps accelerate the convergence rate from O(1/i) to O(1/i 2 ) under decaying step-sizes. Also, in the work [24], it establishes that random reshuffling will not degrade performance relative to the stochastic gradient descent implementation, provided the number of epochs is not too large.…”
Section: Motivationmentioning
confidence: 78%
“…To proceed, we will ignore the last two terms in (27) and consider the following approximate model, which we shall refer to as a long-term model.…”
Section: A Error Dynamicsmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, in [13], another variance-reduction algorithm is proposed under reshuffling; however, no proof of convergence is provided. The closest attempts at proof are the useful arguments given in [14], [15], which deal with special problem formulations. The work [14] deals with the case of incremental aggregated gradients, which corresponds to a deterministic version of RR for SAG, while the work [15] deals with SVRG in the context of ridge regression problems using regret analysis.…”
Section: Introductionmentioning
confidence: 99%