2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2016
DOI: 10.1109/ccgrid.2016.111
|View full text |Cite
|
Sign up to set email alerts
|

CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
11
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…In their work, host‐staged copy type was used for inter‐process communications. Chu et al investigated various algorithms for GPU‐aware MPI_Allreduce across the node. However, none of these work considered hierarchical collective designs.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In their work, host‐staged copy type was used for inter‐process communications. Chu et al investigated various algorithms for GPU‐aware MPI_Allreduce across the node. However, none of these work considered hierarchical collective designs.…”
Section: Related Workmentioning
confidence: 99%
“…To achieve efficient inter‐process GPU communication, MPI libraries should be tuned and become GPU‐aware to be able to efficiently communicate the data residing on the GPU memory. In this regard, researchers have started looking into incorporating GPU‐awareness into the MPI library, targeting both point‐to‐point and collective communications …”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations