2019
DOI: 10.1147/jrd.2019.2947013
|View full text |Cite
|
Sign up to set email alerts
|

BlueConnect: Decomposing all-reduce for deep learning on heterogeneous network hierarchy

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
42
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(44 citation statements)
references
References 9 publications
0
42
0
Order By: Relevance
“…Indeed, the fact that communication is a major performance bottleneck in DDL is well-known [32], and many works [10,35,39,44,58,66] proposed various optimizations to achieve high-bandwidth collective communication specialized for DDL. Besides, a recent body of work, primarily within the ML community, developed gradient compression methods [1,2,42,63,67] to reduce communication time by sending a smaller amount of data, albeit at the cost of reduced training quality due to the lossy nature of compression.…”
Section: Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, the fact that communication is a major performance bottleneck in DDL is well-known [32], and many works [10,35,39,44,58,66] proposed various optimizations to achieve high-bandwidth collective communication specialized for DDL. Besides, a recent body of work, primarily within the ML community, developed gradient compression methods [1,2,42,63,67] to reduce communication time by sending a smaller amount of data, albeit at the cost of reduced training quality due to the lossy nature of compression.…”
Section: Modelmentioning
confidence: 99%
“…Efficient communication in DDL. Several efforts optimize DDL communication ranging from designing high-performance PS software [43] and transfer scheduler [20,25,50], to improving collective communication in heterogeneous networks fabrics [10,28] and within multi-GPU servers [66], to developing in-network reduction systems [35,39,44,57,58], to customizing network congestion protocols and architecture [18]. OmniReduce leverages data sparsity to optimize communication and is complementary to these efforts.…”
Section: Other Related Workmentioning
confidence: 99%
“…Moreover, recently there are some other works related to optimization of all-gather algorithms, where the additional, specific constraints are considered, e.g., in Reference [8], Kang et al provided a solution for intergroup cooperation, rapidly accelerating data gathering between two disjointed process sets; in Reference [29], Zhou et al analyzed and improved all-gather behavior for multi-/many-core processor in compute clusters; in Reference [2], Cho el al. presented an efficient communication library, with an all-gather implementation, for distributed deep learning that is highly optimized for popular GPU-based platforms.…”
Section: Regular All-gather Algorithms In Usementioning
confidence: 99%
“…On the system side, one line of work is to further optimize the system communication primitives and communication strategies to take advantage of the property of the underlying ML workload. Some examples of recent work in this direction include (Hashemi et al, 2019;Jayarajan et al, 2019;Cho et al, 2019;Jia et al, 2019;Wang et al, 2018d). Another line of work tries to automatically optimize in the tradeoff introduced by these system relaxation techniques (e.g., the communication frequency which is often a hyperparameter).…”
Section: System Optimization and Automatic Tradeoff Managementmentioning
confidence: 99%