2019 International Joint Conference on Neural Networks (IJCNN) 2019
DOI: 10.1109/ijcnn.2019.8852172
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Abstract: Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
107
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 152 publications
(108 citation statements)
references
References 23 publications
(47 reference statements)
1
107
0
Order By: Relevance
“…We adopt the D-DSGD scheme proposed in [25, Section III], which is an extension of the one proposed in [18], for digital transmission. With the D-DSGD scheme, gradient estimate g m (θ t ),…”
Section: Digital Dsgdmentioning
confidence: 99%
“…We adopt the D-DSGD scheme proposed in [25, Section III], which is an extension of the one proposed in [18], for digital transmission. With the D-DSGD scheme, gradient estimate g m (θ t ),…”
Section: Digital Dsgdmentioning
confidence: 99%
“…They can send more information bits at the beginning of the DSGD algorithm when the gradient estimates have higher variances, and reduce the number of transmitted bits over time as the variance decreases. We observed empirically that this improves the performance compared to the standard approach in the literature, where the same compression scheme is applied at each iteration [28].…”
Section: Digital Dsgd (D-dsgd)mentioning
confidence: 76%
“…The optimal solution for this scheme will require carefully allocating channel resources across the workers and the available power of each worker across iterations, together with an efficient gradient quantization scheme. For gradient compression, we will consider state-of-the-art quantization approaches together with local error accumulation [28].…”
Section: B Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, with the peculiarity that now D(w, q) (approximately) measures the distortion of w and q in the space of output distributions instead the Euclidian space. The advantage of the rate-distortion objective (9) is that, after the FIM has been calculated, it can be solved by applying common techniques from the source coding literature, such as the scalar Lloyd algorithm.…”
Section: T E X I T S H a 1 _ B A S E 6 4 = " K G Y 4 + Y H Q J B E H mentioning
confidence: 99%