2021
DOI: 10.48550/arxiv.2111.04867
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Abstract: Large ML models and datasets have necessitated the use of multi-GPU systems for distributed model training. To harness the power offered by multi-GPU systems, it is critical to eliminate bottlenecks in inter-GPU communication -a problem made challenging by the heterogeneous nature of interconnects. In this work, we present TACCL, a synthesizer for collective communication primitives for large-scale multi-GPU systems. TACCL encodes a profiled topology and input size into a synthesis problem to generate optimize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…Collective Communication on Heterogeneous Clusters. TACCL [39] models communication as an mixed integer linear programming problem and finds routing and scheduling of each data chunk to minimize communication. BlueConnect [8] decomposes All-Reduce to fit into heterogeneous network hierarchy.…”
Section: Related Workmentioning
confidence: 99%
“…Collective Communication on Heterogeneous Clusters. TACCL [39] models communication as an mixed integer linear programming problem and finds routing and scheduling of each data chunk to minimize communication. BlueConnect [8] decomposes All-Reduce to fit into heterogeneous network hierarchy.…”
Section: Related Workmentioning
confidence: 99%
“…Chunks. Communication data is usually broken into multiple chunks [32,47,56] and then these chunks are fed into this 2×Dstage pipeline to keep all dimensions busy. A chunk is a portion of data to participate in the collective, and the collective algorithm can work on each chunk independently.…”
Section: Multi-rail Hierarchical Collective Comm Algorithmsmentioning
confidence: 99%
“…On the other hand, step_latency is determined by the network component latencies (e.g., NIC latency, link latency, etc.) when transferring a minimumsize message between two NPUs [56]. On real systems, 𝐴 𝐾 can be calculated by running a minimum size collective on dimK.…”
Section: Understanding All Latency Parametersmentioning
confidence: 99%
See 2 more Smart Citations