2018
DOI: 10.1137/16m1094415
|View full text |Cite
|
Sign up to set email alerts
|

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods

Abstract: We study the convergence rate of the proximal incremental aggregated gradient (PIAG) method for minimizing the sum of a large number of smooth component functions (where the sum is strongly convex) and a non-smooth convex function. At each iteration, the PIAG method moves along an aggregated gradient formed by incrementally updating gradients of component functions at least once in the last K iterations and takes a proximal step with respect to the non-smooth function. We show that the PIAG algorithm attains a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
26
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(28 citation statements)
references
References 31 publications
(34 reference statements)
2
26
0
Order By: Relevance
“…Finally, Lemma 3.2 from [16] shows that the sequence {h (k) } k≥1 converges if 1− √ µγ+O(γ For large m, the rate in (4.34) is worse than the rate of 1 − O(1/(mκ(F ))) analyzed in [29] for IAG. The analysis above shows that the theoretical convergence rate may not be improved using the acceleration technique that we have applied in A-CIAG, as long as the same analysis framework for A-CIAG was adopted.…”
Section: Discussionmentioning
confidence: 97%
See 2 more Smart Citations
“…Finally, Lemma 3.2 from [16] shows that the sequence {h (k) } k≥1 converges if 1− √ µγ+O(γ For large m, the rate in (4.34) is worse than the rate of 1 − O(1/(mκ(F ))) analyzed in [29] for IAG. The analysis above shows that the theoretical convergence rate may not be improved using the acceleration technique that we have applied in A-CIAG, as long as the same analysis framework for A-CIAG was adopted.…”
Section: Discussionmentioning
confidence: 97%
“…As analyzed in [16,12], due to the dependence of m 2 on the right hand side, the sequence of squared norm { θ k − θ ⋆ 2 } k≥1 converges only when γ = O(1/m), and finally this shows that θ k − θ ⋆ 2 converges linearly at rate 1 − O(1/(m 2 κ(F ))). Notice that a recent work [29] has strengthened this rate to 1 − O(1/(mκ(F ))) and it is possible to further improve the rate through studying the expected convergence when the component function selection is random, e.g., [27].…”
Section: Gradient Tracking In Incrementalmentioning
confidence: 99%
See 1 more Smart Citation
“…where λ k is the stepsize on each iteration. Proximal gradient method and its variants [13,14,15,16,17,18] have been one hot topic in optimization field for a long time due to their simple forms and lower computation complexity.…”
Section: Introductionmentioning
confidence: 99%
“…However, the linear convergence constants of these methods are not necessarily better than the linear convergence constant of GD for a problem with comparable condition number. This leaves open the possibility that the worst case performance of these methods is worse than the worst case performance of GD -see Section 2. Given that the only difference between stochastic and incremental methods is that in the latter functions are chosen in a cyclic order -as opposed from the selection in stochastic methods which is uniformly at random -it is not surprising that analogous statements can be made for incremental gradient descent methods (IGD) [1,33,22,24,3,25,13,2,34,11,35]. Standard IGD has a slow sublinear convergence rate, which motivates the introduction of memory.…”
mentioning
confidence: 99%