2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.44
|View full text |Cite
|
Sign up to set email alerts
|

Topology-Aware GPU Selection on Multi-GPU Nodes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…Thus, we utilize a fixed application topology graph for allocation decisions. Application communication patterns can be manually specified by the programmer, or can be automatically extracted through program analysis or profiling [16,18,59,70]. We will outline how each can be performed in the remainder of this subsection.…”
Section: Application Topologymentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, we utilize a fixed application topology graph for allocation decisions. Application communication patterns can be manually specified by the programmer, or can be automatically extracted through program analysis or profiling [16,18,59,70]. We will outline how each can be performed in the remainder of this subsection.…”
Section: Application Topologymentioning
confidence: 99%
“…With CUDA-aware MPI [10], these GPU-GPU communication can be handled directly through NVLink without going through the host. While source code analysis of MPI calls can explicitly identify the communication pattern, many recent works have aimed to automatically identify MPI application topologies [16,18], or automatically identify communication through compiler-assisted skeletons [59,70].…”
Section: Application Topologymentioning
confidence: 99%
“…Other systems typically arrange multiple GPUs in a hierarchical topology, i.e. a binary fat-tree, where the physical distance between a GPU pair can have a noticeable impact on communication efficiency [Faraji et al 2016]. Data transfer between a GPU pair with greater physical distance will need to traverse through a higher number of switches and longer paths, and thereby resulting in lower memory bandwidth.…”
Section: Interconnect Between Multiple Gpusmentioning
confidence: 99%
“…GPUs are interconnected with switches (e.g., PCI-e internal switch, PCI-e host bridge, etc) so that they can communicate with others. GPUs within a shorter distance result in higher memory bandwidth and lower latency [Faraji et al 2016]. For example, GPU 0 and GPU 1 can immediately communicate with each other through Switch 1 , while communication between GPU 0 and GPU 2 is more time-consuming than immediate communication (through Switch 1 , Switch 0 , and Switch 2 ).…”
Section: Overhead Analysismentioning
confidence: 99%
“…al. [13] propose a topology-aware GPU selection scheme to assign GPU devices to MPI processes based on the GPU-to-GPU communication pattern and the physical characteristics of a multi-GPU machine. With pro le information from the MPI application, it allocates GPUs performing a graph mapping algorithm using the SCOTCH library.…”
Section: Related Workmentioning
confidence: 99%