2021
DOI: 10.48550/arxiv.2103.03936
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pufferfish: Communication-efficient Models At No Extra Cost

Abstract: To mitigate communication overheads in distributed model training, several studies propose the use of compressed stochastic gradients, usually achieved by sparsification or quantization. Such techniques achieve high compression ratios, but in many cases incur either significant computational overheads or some accuracy loss. In this work, we present PUFFERFISH, a communication and computation efficient distributed training framework that incorporates the gradient compression into the model training process via … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 64 publications
0
3
0
Order By: Relevance
“…Communication-Efficient FL Algorithms -Many recent works proposed sparsification and quantization methods specifically designed for FL (Alistarh et al 2017(Alistarh et al , 2018Albasyoni et al 2020;Wangni et al 2017;Wang et al 2018;Wang, Agarwal, and Papailiopoulos 2021;Alistarh et al 2017;Wen et al 2017;Reisizadeh et al 2020). These methods are also called sketched approach (Konečnỳ et al 2016).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Communication-Efficient FL Algorithms -Many recent works proposed sparsification and quantization methods specifically designed for FL (Alistarh et al 2017(Alistarh et al , 2018Albasyoni et al 2020;Wangni et al 2017;Wang et al 2018;Wang, Agarwal, and Papailiopoulos 2021;Alistarh et al 2017;Wen et al 2017;Reisizadeh et al 2020). These methods are also called sketched approach (Konečnỳ et al 2016).…”
Section: Related Workmentioning
confidence: 99%
“…et al 2018), FedNova (Wang et al 2020), and SCAFFOLD (Karimireddy et al 2020), periodically average the full local solutions across all the clients. Many communicationefficient FL strategies, such as gradient (model) sparsification (Wangni et al 2017;Wang et al 2018;Alistarh et al 2018), low-rank approximation (Vogels, Karimireddy, and Jaggi 2020;Wang, Agarwal, and Papailiopoulos 2021), and quantization (Alistarh et al 2017;Wen et al 2017;Albasyoni et al 2020;Reisizadeh et al 2020) techniques, also periodically aggregate the compressed form of the full local solutions. Adaptive model aggregation techniques (Wang and Joshi 2018a;Haddadpour et al 2019) adjust the aggregation interval at run-time to reduce the total number of communications, however, they still aggregate the full local models at once.…”
Section: Introductionmentioning
confidence: 99%
“…The reconstructed parameters are local to the clients, and never sent to the server. Wang et al [2021a] proposed training low-rank, pre-factorized deep networks to reduce communication in distributed learning. Other methods, like compression, and knowledge distillation have been used in FL to reduce the communication costs.…”
Section: Related Workmentioning
confidence: 99%