Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware 2015
DOI: 10.1145/2832241.2832247
|View full text |Cite
|
Sign up to set email alerts
|

Hyper-Q aware intranode MPI collectives on the GPU

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 4 publications
1
3
0
Order By: Relevance
“…The work in this paper extends our prior study 12,13 in different ways. While our collective designs in our other work 12 target a single node with a single GPU, in this paper, we extend our work and propose a three-level hierarchical framework for GPU collectives for clusters with multi-GPU nodes.…”
supporting
confidence: 63%
See 1 more Smart Citation
“…The work in this paper extends our prior study 12,13 in different ways. While our collective designs in our other work 12 target a single node with a single GPU, in this paper, we extend our work and propose a three-level hierarchical framework for GPU collectives for clusters with multi-GPU nodes.…”
supporting
confidence: 63%
“…We evaluate different combinations of our algorithms in the proposed framework and discuss our findings. In addition, this paper extends the proposed algorithms in our other work from a single GPU to across the clusters and provides an extended evaluation of using different copy types for collective operations against a wider set of alternative designs. Our experimental results highlight the importance of efficiently using the right copy type in GPU collective operations; this observation is further investigated and discussed in this paper by providing some profiling results.…”
Section: Introductionmentioning
confidence: 92%
“…Recent work [56,44,55,31] leverage CUDA IPC in order to improve various intra-node and inter-node MPI collectives of a single process/application, and thus facilitate the porting to, and improve the performance of HPC applications on GPUs. MVAPICH2 [53], for instance, supports the use of MPI calls directly over GPU memory.…”
Section: Related Workmentioning
confidence: 99%
“…The result of calculating Euclidean distance is continued in the second kernel using Parallel Reduce Interleaved Address method. This method is selected because it can complete the summation in the array [16]. In this second kernel the number of threads used is the same as the number of data features.…”
Section: Finding Bmumentioning
confidence: 99%