GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters

Oden, Lena; Fröning, Holger

doi:10.1109/cluster.2013.6702638

Cited by 31 publications

(10 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For larger messages, GGAS and MPI don't differ strongly, still GGAS is a little bit better. However, this corresponds to our previous results [1] that show that GGAS is performing best for small and medium size data transfers.…”

Section: Comparison With Mpisupporting

confidence: 91%

“…The second one is required after the broadcast operation. However, although GGAS allows fast synchronization between the GPUs [1], every synchronization requires additional data transfers and adds overhead to the application. Especially for small sizes, this synchronization overhead may surpass the data transfer latency.…”

Section: B Reduction With Remote Writesmentioning

confidence: 99%

“…1 the simplified user view of a GGAS GPU cluster is shown. The distributed device 1 To avoid confusions with the similar names of the global device memory and the global address space, we will use the term device memory for the global device memory of a single GPU. The term global address space is used for the memory region composed by shared device memories of multiple, distributed GPUs at cluster level.…”

Section: B Global Gpu Address Spacesmentioning

confidence: 99%

“…However, for such a communicationcentric architecture, minimal costs for communication and synchronization are mandatory to maintain scalability, performance and energy efficiency. In previous work [1] we have introduced GPU Global Address Spaces (GGAS) as a communication model that is perfectly in-line with the GPUs thread-collective execution model. GGAS not only allows increasing performance for a variety of workloads, it also avoids hybrid programming models like MPI+CUDA and complete bypasses the CPU for inter-GPU communication tasks.…”

Section: Introductionmentioning

confidence: 97%

See 3 more Smart Citations

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs

Oden

Klenk

Fröning

2014

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Self Cite

View full text Add to dashboard Cite

GPUs gain high popularity in High Performance Computing, due to their massive parallelism and high performance per Watt. Despite their popularity, data transfer between multiple GPUs in a cluster remains a problem. Most communication models require the CPU to control the data flow; also intermediate staging copies to host memory are often inevitable. These two facts lead to higher CPU and memory utilization. As a result, overall performance decreases and power consumption increases.Collective operations like reduce and allreduce are very common in scientific simulations and also very sensitive to performance. Due to their massive parallelism, GPUs are very suitable for such operations, but they only excel in performance if they can process the problem in-core. Global GPU Address Spaces (GGAS) enable a direct GPU-to-GPU communication for heterogeneous clusters, which is completely in-line with the GPUs thread-collective execution model and does not require CPU assistance or staging copies in host memory. As we will see, GGAS helps to process collective operations among distributed GPUs in-core.In this paper, we introduce the implementation and optimization of collective reduce and allreduce operations using GGAS as a communication model. Compared to message passing, we get a speedup of 1.7x for small data sizes. A detailed analysis based on power measurements of CPU, host memory and GPU reveals that GGAS as communication model not only saves cycles, also the power and energy consumption is reduced dramatically. For instance, for an allreduce operation half of the energy can be saved by the reduced the power consumption in combination with the lower run time.

show abstract

Section: Comparison With Mpisupporting

confidence: 91%

Section: B Reduction With Remote Writesmentioning

confidence: 99%

Section: B Global Gpu Address Spacesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 97%

See 2 more Smart Citations

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs

Oden

Klenk

Fröning

2014

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly MIC-RO [13] enables sharing and using multiple Intel Many Integrated Core (MIC), cards across nodes. The work in [14] proposed concepts that allow an HCA to access GPU memory similar to GDR. However, their concepts require specific hardware and cannot be applied to production ready HPC systems.…”

Section: Related Workmentioning

confidence: 99%