2020
DOI: 10.1109/tpds.2019.2928289
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Abstract: High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations. However, the lack of deep understanding on how modern GPUs can be connected and the real impact of state-of-the-art interconnect technology on multi-GPU application performance become a hurdle. In this paper, we fill the gap by conducting a thorough evaluation on five latest types of modern GPU interconnect… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
70
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 143 publications
(71 citation statements)
references
References 41 publications
1
70
0
Order By: Relevance
“…Further important questions are how the hardware components are composed to avoid bottlenecks. Li et al [100] have performed a comprehensive performance evaluation of recent GPU interconnects. In terms of energy consumption, Wang et al [181] provide evaluations that compare FPGAs to GPUs.…”
Section: Infrastructurementioning
confidence: 99%
“…Further important questions are how the hardware components are composed to avoid bottlenecks. Li et al [100] have performed a comprehensive performance evaluation of recent GPU interconnects. In terms of energy consumption, Wang et al [181] provide evaluations that compare FPGAs to GPUs.…”
Section: Infrastructurementioning
confidence: 99%
“…When the filtering stage is applied at the GPU, the AllGather collective will be applied on data residing in GPU memory and not on data residing in the CPU memory (as is the case when the filtering is applied on the CPU). Applying AllGather on data residing on the GPU incur the extra cost of moving data across the PCIe interconnect, even when the GPUDirect [36,49] Technology is enabled.…”
Section: Filtering Stagementioning
confidence: 99%
“…Commonly, in a single compute node, there are multiple GPUs (e.g. ORNL'Summit has six GPUs, LLNL's Sierra and TokyoTech's Tsubame have four GPUs), which are connected to the CPUs by PCIe or NVLink [36]. We launch a number of MPI ranks per compute node equivalent to the number of GPUs, i.e.…”
Section: Dmentioning
confidence: 99%
See 1 more Smart Citation
“…Data is fetched over a high-latency, low-bandwidth channel during kernel execution and page migrations incur additional overhead due to fault handling. Given that on current systems, GPU memory bandwidth is an order-of-magnitude higher than that of the CPU-GPU interconnect [33], a device-only placement policy would appear to be the natural choice when programmability is not a concern. Notwithstanding, placement decisions are more nuanced in practice for several reasons.…”
Section: Introductionmentioning
confidence: 99%