FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Zhao, Haoyu; Li, Zhize; Richtárik, Peter

doi:10.48550/arxiv.2108.04755

Cited by 7 publications

(12 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, for this setup we modify EF21 and combine it with variance reduction. In particular, this time we replace ∇𝑓 𝑖 (𝑥 𝑡+1 ) in the formula for 𝑐 𝑡 𝑖 with the PAGE estimator (Li et al, 2021) 𝑣 𝑡+1…”

Section: Methodsmentioning

confidence: 99%

“…In the finite-sum setting (3), we enhance EF21 with a variance reduction technique to reduce the computational complexity. In particular, we adopt the simple and efficient variance-reduced method PAGE (Li et al, 2021;Li, 2021b) (which is optimal for solving problems (3)) into EF21, and call the resulting method EF21-PAGE (Algorithm 3). See Appendix E for more details.…”

Section: Our Contributionsmentioning

confidence: 99%

“…To achieve this, variance reduction techniques are instrumental. One approach is to apply a PAGE-estimator (Li et al, 2021) instead of a random minibatch applied in SGD. Note that PAGE has optimal complexity for nonconvex problems of the form (3).…”

Section: E Variance Reductionmentioning

confidence: 99%

See 2 more Smart Citations

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

Fatkhullin,

Sokolov,

Gorbunov

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

First proposed by Seide et al. (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is 𝑂(1/𝑇 2/3 ), the rate of gradient descent in the same regime is 𝑂(1/𝑇 )). Recently, Richtárik et al. ( 2021) (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with EF before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.* The work of Ilyas Fatkhullin was performed during a Summer research internship conducted in the Optimization and Machine Learning Lab led by Peter Richtárik. At the time when this paper was first released, Ilyas Fatkhullin was a master's student at the

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Our Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

Fatkhullin,

Sokolov,

Gorbunov

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the nonconvex setting, the functions {f i } i∈ [N ] are arbitrary functions that satisfy the following standard smoothness assumption (Johnson and Zhang, 2013;Defazio et al, 2014;Nguyen et al, 2017;Zhou et al, 2018;Fang et al, 2018;Li, 2019;, and we assume that the unbiased local stochastic gradient oracle ∇f i (x) has bounded local variance (McMahan et al, 2017;Zhao et al, 2021), which is also standard in the federated learning literature. For simplicity, we use ∇b f i (x) to denote the stochastic gradient oracle that uses a minibatch size b, which is the average of b independent unbiased stochastic gradients oracle ∇f i (x).…”

Section: Assumptions About the Functionsmentioning

confidence: 99%

“…Among these algorithms, most of them need to periodically compute full gradients to reduce the variance of the gradient estimator, e.g. SARAH (Nguyen et al, 2017), PAGE ,PP-MARINA , FedPAGE (Zhao et al, 2021), except ZeroSARAH (Li and Richtárik, 2021b) which never needs to compute the full gradient. The idea of ZeroSARAH is to maintain another unbiased approximate gradient estimator for the full gradient, and update both the biased recursive and unbiased approximate estimators in each round.…”

Section: Frecon Algorithmmentioning

confidence: 99%

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Zhao¹,

Burlachenko²,

Li³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, there exists client-variance in federated learning due to the total number of heterogeneous clients is usually very large and the server is unable to communicate with all clients in each communication round. In this paper, we address these two issues together by proposing compressed and client-variance reduced methods. Concretely, we introduce COFIG and FRECON, which successfully enjoy communication compression with client-variance reduction. The total communication round of COFIG is O() in the nonconvex setting, where N is the total number of clients, S is the number of communicated clients in each round, ǫ is the convergence error, and ω is the parameter for the compression operator. Besides, our FRECON can converge faster than COFIG in the nonconvex setting, and it converges with O( (1+ω)) communication rounds. In the convex setting, COFIG converges within the communication rounds O( (1+ω)), which is also the first convergence result for compression schemes that do not communicate with all the clients in each round. In sum, both COFIG and FRECON do not need to communicate with all the clients and provide first/faster convergence results for convex and nonconvex federated learning, while previous works either require full clients communication (thus not practical) or obtain worse convergence results.

show abstract

Open problems in medical federated learning

Yoo

Jeong

Lee

et al. 2022

IJWIS

View full text Add to dashboard Cite

Purpose This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment. Design/methodology/approach Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions. Findings This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution. Originality/value Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.

show abstract

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Cited by 7 publications

References 15 publications

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Open problems in medical federated learning

Contact Info

Product

Resources

About