Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2017
DOI: 10.1145/3126908.3126933
|View full text |Cite
|
Sign up to set email alerts
|

Topology-aware GPU scheduling for learning workloads in cloud environments

Abstract: Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 48 publications
(35 citation statements)
references
References 28 publications
0
35
0
Order By: Relevance
“…As regards the DL training, Xiao et al [26] propose Gandiva, a scheduling framework able to improve latency in training DL models on a GPUs cluster by exploiting heterogeneity and recurrent behaviors of DL jobs while running mini-batch iterations. Finally, a seminal work on scheduling multi-GPUs among competing jobs on highend servers is [27]. The paper proposes a topology-aware scheduling policy for DL jobs in cloud environments, which provides a placement strategy to schedule jobs on a Power8 machine based on NVLink able to satisfy workload requirements preventing also application interference.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…As regards the DL training, Xiao et al [26] propose Gandiva, a scheduling framework able to improve latency in training DL models on a GPUs cluster by exploiting heterogeneity and recurrent behaviors of DL jobs while running mini-batch iterations. Finally, a seminal work on scheduling multi-GPUs among competing jobs on highend servers is [27]. The paper proposes a topology-aware scheduling policy for DL jobs in cloud environments, which provides a placement strategy to schedule jobs on a Power8 machine based on NVLink able to satisfy workload requirements preventing also application interference.…”
Section: Related Workmentioning
confidence: 99%
“…As in other literature proposals, the job inter-arrival times have been generated according to Poisson Distributions [27]. Two classes of instances have been generated, which differ in the mean of the distribution: 30s for the first class and 45s for the second class.…”
Section: A Experimental Setupmentioning
confidence: 99%
“…Many of them are interrelated, such as L1/L2/L3 memory cache misses, but they are still relevant for many performance driven decisions. An example of such importance is the emerging effort to develop novel NUMA-aware [23] and GPU-topologyaware [24] placement strategies for BigData and DeepLearning workloads, because the topology of modern processors is extremely complex and can significantly impact the performance of applications. Furthermore works like [25] (which use the same features as we do in our experiments) use dimensionality reduction in order to facilitate parameter estimation.…”
Section: Motivationmentioning
confidence: 99%
“…One cause of such underutilization is that DL resource managers disallow co-location of multiple DL jobs within the same GPU [25,13]; a characteristic shared within other resource managers such as Kubernetes and Yarn that were originally designed for CPU-based workloads [16,35]. Instead, the majority of DL resource managers focus on reducing network latency and locality [3,25,13]. This inability to co-locate DL jobs within the same GPU results in reduced resource utilization, longer queuing times and reduced cost efficiency within DL systems.…”
Section: Introductionmentioning
confidence: 99%
“…Recent DL resource managers have been proposed that make placement decisions by consolidating DL jobs onto fewer machines to minimize workload and JCT [3,37,13]. However, while DL resource managers now exist that allow for co-location [37,29], there has been little attention drawn to the performance interference which arises between multiple DL jobs training within the same GPU.…”
Section: Introductionmentioning
confidence: 99%