Topology-Aware GPU Selection on Multi-GPU Nodes

Faraji, Iman; Mirsadeghi, Seyed H.; Afsahi, Ahmad

doi:10.1109/ipdpsw.2016.44

Cited by 23 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, we utilize a fixed application topology graph for allocation decisions. Application communication patterns can be manually specified by the programmer, or can be automatically extracted through program analysis or profiling [16,18,59,70]. We will outline how each can be performed in the remainder of this subsection.…”

Section: Application Topologymentioning

confidence: 99%

“…With CUDA-aware MPI [10], these GPU-GPU communication can be handled directly through NVLink without going through the host. While source code analysis of MPI calls can explicitly identify the communication pattern, many recent works have aimed to automatically identify MPI application topologies [16,18], or automatically identify communication through compiler-assisted skeletons [59,70].…”

Section: Application Topologymentioning

confidence: 99%

See 1 more Smart Citation

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

Ranganath,

Suetterlein,

Manzano

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-accelerator servers are increasingly being deployed in shared multi-tenant environments (such as in cloud data centers) in order to meet the demands of large-scale compute-intensive workloads. In addition, these accelerators are increasingly being inter-connected in complex topologies and workloads are exhibiting a wider variety of inter-accelerator communication patterns. However, existing allocation policies are ill-suited for these emerging use-cases. Specifically, this work identifies that multi-accelerator workloads are commonly fragmented leading to reduced bandwidth and increased latency for inter-accelerator communication.We propose Multi-Accelerator Pattern Allocation (MAPA), a graph pattern mining approach towards providing generalized allocation support for allocating multi-accelerator workloads on multi-accelerator servers. We demonstrate that MAPA is able to improve the execution time of multi-accelerator workloads and that MAPA is able to provide generalized benefits across various accelerator topologies. Finally, we demonstrate a speedup of 12.4% for 75th percentile of jobs with the worst case execution time reduced by up to 35% against baseline policy using MAPA.

show abstract

Section: Application Topologymentioning

confidence: 99%

Section: Application Topologymentioning

confidence: 99%

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

Ranganath,

Suetterlein,

Manzano

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Other systems typically arrange multiple GPUs in a hierarchical topology, i.e. a binary fat-tree, where the physical distance between a GPU pair can have a noticeable impact on communication efficiency [Faraji et al 2016]. Data transfer between a GPU pair with greater physical distance will need to traverse through a higher number of switches and longer paths, and thereby resulting in lower memory bandwidth.…”

Section: Interconnect Between Multiple Gpusmentioning

confidence: 99%

“…GPUs are interconnected with switches (e.g., PCI-e internal switch, PCI-e host bridge, etc) so that they can communicate with others. GPUs within a shorter distance result in higher memory bandwidth and lower latency [Faraji et al 2016]. For example, GPU 0 and GPU 1 can immediately communicate with each other through Switch 1 , while communication between GPU 0 and GPU 2 is more time-consuming than immediate communication (through Switch 1 , Switch 0 , and Switch 2 ).…”

Section: Overhead Analysismentioning

confidence: 99%

P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators

Li,

Tang,

Tong

et al. 2020

Preprint

View full text Add to dashboard Cite

nnnnnnn be handled on a single GPU due to memory limitations. We have evaluated the performance with two multi-GPU workstations (with 4 and 8 GPUs, respectively) on cloth meshes with 0.5 − 1.65M triangles. Our approach can reliably handle the collisions and generate vivid wrinkles and folds at 2 − 5 fps, which is significantly faster than prior cloth simulation systems. We observe almost linear speedups with respect to the number of GPUs.

show abstract

“…al. [13] propose a topology-aware GPU selection scheme to assign GPU devices to MPI processes based on the GPU-to-GPU communication pattern and the physical characteristics of a multi-GPU machine. With pro le information from the MPI application, it allocates GPUs performing a graph mapping algorithm using the SCOTCH library.…”

Section: Related Workmentioning

confidence: 99%

Topology-aware GPU scheduling for learning workloads in cloud environments

Amaral

Polo

Carrera

et al. 2017

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a largescale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.

show abstract

Topology-Aware GPU Selection on Multi-GPU Nodes

Cited by 23 publications

References 11 publications

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

P-Cloth: Interactive Complex Cloth Simulation on Multi-GPU Systems using Dynamic Matrix Assembly and Pipelined Implicit Integrators

Topology-aware GPU scheduling for learning workloads in cloud environments

Contact Info

Product

Resources

About