Proceedings of the 2021 ACM SIGCOMM 2021 Conference 2021
DOI: 10.1145/3452296.3472904
|View full text |Cite
|
Sign up to set email alerts
|

Efficient sparse collective communication and its application to accelerate distributed deep learning

Abstract: Efficient collective communication is crucial to parallel-computing applications such as distributed training of large-scale recommendation systems and natural language processing models. Existing collective communication libraries focus on optimizing operations for dense inputs, resulting in transmissions of many zeros when inputs are sparse. This counters current trends that see increasing data sparsity in large models.We propose OmniReduce, an efficient streaming aggregation system that exploits sparsity to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 44 publications
(13 citation statements)
references
References 28 publications
1
11
0
Order By: Relevance
“…Last, some solutions target programmable switches [11,24,25], that provide more flexibility by allowing the programmer to specify the type of processing to be executed on the packets with high-level programming languages. Programmable switches are often implemented through Reconfigurable Match-Action Tables (RMTs) [26][27][28][29], and can be configured with the P4 programming language [30][31][32].…”
Section: Programmable Switchesmentioning
confidence: 99%
See 4 more Smart Citations

Flare: Flexible In-Network Allreduce

De Sensi,
Di Girolamo,
Ashkboos
et al. 2021
Preprint
“…Last, some solutions target programmable switches [11,24,25], that provide more flexibility by allowing the programmer to specify the type of processing to be executed on the packets with high-level programming languages. Programmable switches are often implemented through Reconfigurable Match-Action Tables (RMTs) [26][27][28][29], and can be configured with the P4 programming language [30][31][32].…”
Section: Programmable Switchesmentioning
confidence: 99%
“…F2 -Sparse data. Many applications need to reduce sparse data [5,25,35,39], i.e., data containing mostly null values. To save bandwidth and improve performance, an application might only transmit and reduce the non-null values.…”
Section: Limitationsmentioning
confidence: 99%
See 3 more Smart Citations

Flare: Flexible In-Network Allreduce

De Sensi,
Di Girolamo,
Ashkboos
et al. 2021
Preprint