2021
DOI: 10.48550/arxiv.2108.04755
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Haoyu Zhao,
Zhize Li,
Peter Richtárik

Abstract: Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 15 publications
0
12
0
Order By: Relevance
“…Therefore, for this setup we modify EF21 and combine it with variance reduction. In particular, this time we replace ∇𝑓 𝑖 (𝑥 𝑡+1 ) in the formula for 𝑐 𝑡 𝑖 with the PAGE estimator (Li et al, 2021) 𝑣 𝑡+1…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, for this setup we modify EF21 and combine it with variance reduction. In particular, this time we replace ∇𝑓 𝑖 (𝑥 𝑡+1 ) in the formula for 𝑐 𝑡 𝑖 with the PAGE estimator (Li et al, 2021) 𝑣 𝑡+1…”
Section: Methodsmentioning
confidence: 99%
“…In the finite-sum setting (3), we enhance EF21 with a variance reduction technique to reduce the computational complexity. In particular, we adopt the simple and efficient variance-reduced method PAGE (Li et al, 2021;Li, 2021b) (which is optimal for solving problems (3)) into EF21, and call the resulting method EF21-PAGE (Algorithm 3). See Appendix E for more details.…”
Section: Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the nonconvex setting, the functions {f i } i∈ [N ] are arbitrary functions that satisfy the following standard smoothness assumption (Johnson and Zhang, 2013;Defazio et al, 2014;Nguyen et al, 2017;Zhou et al, 2018;Fang et al, 2018;Li, 2019;, and we assume that the unbiased local stochastic gradient oracle ∇f i (x) has bounded local variance (McMahan et al, 2017;Zhao et al, 2021), which is also standard in the federated learning literature. For simplicity, we use ∇b f i (x) to denote the stochastic gradient oracle that uses a minibatch size b, which is the average of b independent unbiased stochastic gradients oracle ∇f i (x).…”
Section: Assumptions About the Functionsmentioning
confidence: 99%
“…Among these algorithms, most of them need to periodically compute full gradients to reduce the variance of the gradient estimator, e.g. SARAH (Nguyen et al, 2017), PAGE ,PP-MARINA , FedPAGE (Zhao et al, 2021), except ZeroSARAH (Li and Richtárik, 2021b) which never needs to compute the full gradient. The idea of ZeroSARAH is to maintain another unbiased approximate gradient estimator for the full gradient, and update both the biased recursive and unbiased approximate estimators in each round.…”
Section: Frecon Algorithmmentioning
confidence: 99%