2017 IEEE 24th International Conference on High Performance Computing (HiPC) 2017
DOI: 10.1109/hipc.2017.00037
|View full text |Cite
|
Sign up to set email alerts
|

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(16 citation statements)
references
References 8 publications
0
16
0
Order By: Relevance
“…2) NVSHMEM Perftest: As discussed in Section II, the NVSHMEM [3,12] communication library provides lightweight communication operations for accessing GPU memory. NVSHMEM is written using the IBV interface.…”
Section: A Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…2) NVSHMEM Perftest: As discussed in Section II, the NVSHMEM [3,12] communication library provides lightweight communication operations for accessing GPU memory. NVSHMEM is written using the IBV interface.…”
Section: A Benchmarksmentioning
confidence: 99%
“…Moving data between accelerator memories has been a significant bottleneck in distributed computing environments [20,21]. Unlike earlier systems that rely mainly on CPU-initiated mechanisms [20], moving data residing on accelerator memories has recently involved novel mechanisms, including deviceinitiated [3,12,[22][23][24] and hardware transparent migration using unified memory models [25,26].…”
Section: Related Workmentioning
confidence: 99%
“…After fulfilling data transfer between CPUs, the function cudaMcemcpyDevice-ToHost is called to transfer the data from CPU to the target GPU. The latest device introduced by NVIDIA Corporation is Tesla V100, which provides an NVLink bus technique [32] to achieve communication between GPUs directly.…”
Section: Cuda and Gpu Parallel Algorithm For Cfdmentioning
confidence: 99%
“…GPU Global Address Space (GGAS) [27] implements intra-kernel networking by adding explicit hardware in the GPU to support a clusterwide global address space. Oden et al [29], GPUrdma [10], and Potluri et al [32] all explore techniques to implement IniniBand entirely on the GPU. Unfortunately, these works either have challenges with performance [29] or data visibility [10,32] related to the GPU's relaxed memory consistency model.…”
Section: Related Workmentioning
confidence: 99%
“…Oden et al [29], GPUrdma [10], and Potluri et al [32] all explore techniques to implement IniniBand entirely on the GPU. Unfortunately, these works either have challenges with performance [29] or data visibility [10,32] related to the GPU's relaxed memory consistency model. Klenk et al [17,18] explore a number of techniques and communication models to support communication directly from the GPU and show good performance in a number of cases.…”
Section: Related Workmentioning
confidence: 99%