2020
DOI: 10.1109/tsp.2020.2968280
|View full text |Cite
|
Sign up to set email alerts
|

Variance-Reduced Stochastic Learning Under Random Reshuffling

Abstract: Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA, Finito, and SAG, have been proposed to minimize empirical risks with linear convergence properties to the exact minimizer. The existing convergence results assume uniform data sampling with replacement. However, it has been observed in related works that random reshuffling can deliver superior performance over uniform sampling and, yet, no formal proofs or guarantees of exact convergence exist for variance-reduced algorithms un… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 20 publications
0
10
0
Order By: Relevance
“…n=1 , {x 2,n } N2 n=1 , · · · , {x K,n } N K n=1 , (1) where N = K k=1 N k . We consider minimizing an empirical risk function, J(w), which is defined as the sample average of loss values over all observed data samples in the network: Here, the notation Q(w; x n ) denotes the loss value evaluated at w and the n-th sample, x n .…”
Section: A Problem Formulationmentioning
confidence: 99%
See 2 more Smart Citations
“…n=1 , {x 2,n } N2 n=1 , · · · , {x K,n } N K n=1 , (1) where N = K k=1 N k . We consider minimizing an empirical risk function, J(w), which is defined as the sample average of loss values over all observed data samples in the network: Here, the notation Q(w; x n ) denotes the loss value evaluated at w and the n-th sample, x n .…”
Section: A Problem Formulationmentioning
confidence: 99%
“…First, we derive a fully-decentralized variance-reduced stochastic-gradient algorithm with significantly reduced memory requirements. We refer to the technique as the diffusion-AVRG method (where AVRG stands for the "amortized variance-reduced gradient" method proposed in the related work [1] for single-agent learning). Unlike DSA [32], the proposed method does not require extra memory to store gradient estimates.…”
Section: Contributionmentioning
confidence: 99%
See 1 more Smart Citation
“…The boldface notation for the symbols w and σ in (3) emphasizes the random nature of these variables due to the randomness in the permutation operation. While the samples over one epoch are no longer picked independently from each other, the uniformity of the permutation function implies the following useful properties [19], [22], [23]:…”
Section: Motivationmentioning
confidence: 99%
“…There is a family of variance-reduction algorithms such as SVRG [55], SAGA [44], and AVRG [56] that can approach the exact solution of the empirical risk function with constant stepsize. In this work, we exploit the SAGA construction because the variables {u n,k } can readily be used in that implementation.…”
Section: B Variance-reduction Algorithmmentioning
confidence: 99%