2020
DOI: 10.48550/arxiv.2002.12410
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Biased Compression for Distributed Learning

Abstract: In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stocha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
101
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(103 citation statements)
references
References 12 publications
2
101
0
Order By: Relevance
“…However, such a simple strategy does not converge to the accurate solution due to the compression error, and even may leads to divergence as the compression error would accumulate. Examples have been provided in [11], [12] to illustrate this. Therefore, communication compression in decentralized algorithms has gained considerable attention recently.…”
Section: A Related Work and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, such a simple strategy does not converge to the accurate solution due to the compression error, and even may leads to divergence as the compression error would accumulate. Examples have been provided in [11], [12] to illustrate this. Therefore, communication compression in decentralized algorithms has gained considerable attention recently.…”
Section: A Related Work and Motivationmentioning
confidence: 99%
“…[33]- [37], and also includes biased and non-contractive compressors, such as the norm-sign compressor. Moreover, it is straightforward to check that the class of compressors satisfying Assumption 1 also covers the three classes of biased compressors considered in [12]. In other words, Assumption 1 is weaker than various commonly used assumptions for compressors in the literature.…”
Section: A Compressorsmentioning
confidence: 99%
“…To remove the synchronization barrier, asynchronous update methods are proposed [52,64,68,76,120,123]. There are also approaches that combines multiple strategies listed above [15,17,44,56,80]. On the other hand, researches about model parallelism attempt to study how to allocate model parameters and training computation across compute units in a cluster to maximize training throughput and minimize communication overheads.…”
Section: Distributed Deep Learningmentioning
confidence: 99%
“…The common choice of C(•) can be top-k (Basu et al, 2019) or sign operation (leads to signSGD (Bernstein et al, 2018)). Although this naive compression method is intuitive, it can diverge in practice, even in simple quadratic problems (Beznosikov et al, 2020) or constraint linear problems (Karimireddy et al, 2019). Intuitively speaking, one of the major drawbacks of naive compression is that the compression error is accumulating during the training process.…”
Section: Existing Solutions and Drawbacksmentioning
confidence: 99%
“…It directly compresses the local fresh gradient on each worker before uploading the gradient to the server. However, the compression would slow down the convergence or even diverge (Beznosikov et al, 2020) due to the loss of information at each compression step. Later on, error feedback strategy (Stich et al, 2018;Karimireddy et al, 2019) was proposed to alleviate this problem and reduce the information loss by proposing a compensating error sequence.…”
Section: Introductionmentioning
confidence: 99%