The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2022
DOI: 10.1109/tpds.2021.3079202
|View full text |Cite
|
Sign up to set email alerts
|

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Abstract: To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this paper we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively pre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 47 publications
(31 citation statements)
references
References 40 publications
0
13
0
Order By: Relevance
“…used and the best available configuration (VM type) to be assigned to each selected node is solved. As in other literature proposals [14], [17]- [19], we assume that multiple jobs can be deployed on the same node, while, within each node, each job receives for exclusive use a certain number of GPUs. As will be observed in Section 4.6, the interference experienced among jobs in the same VM is negligible in our setting.…”
Section: System Architecture and Problem Statementmentioning
confidence: 99%
See 1 more Smart Citation
“…used and the best available configuration (VM type) to be assigned to each selected node is solved. As in other literature proposals [14], [17]- [19], we assume that multiple jobs can be deployed on the same node, while, within each node, each job receives for exclusive use a certain number of GPUs. As will be observed in Section 4.6, the interference experienced among jobs in the same VM is negligible in our setting.…”
Section: System Architecture and Problem Statementmentioning
confidence: 99%
“…Moreover, they propose a dynamic programming-based heuristic algorithm to determine an effective resource allocation, while jobs are scheduled relying on a FIFO mechanism. Finally, an interference-aware and prediction-based resource manager is proposed in [19], where GPU utilization is identified as a proxy metric that allows to determine good placement decisions.…”
Section: Resource Selectionmentioning
confidence: 99%
“…As F is microservicespecific, each key component of DLRA will be profiled by the DLRA Master. We pre-train the prediction model in an offline training stage, similarly to existing approaches [36], [18], [37], based on a set of workload benchmarking and profiling, but will update the model parameters periodically according to the on-the-fly resource usage.…”
Section: Qos Prediction Enginementioning
confidence: 99%
“…The ability to co-locate jobs (i.e., execute within the same CPU or GPU) has been identified as a means to address under-utilization problem. Understanding and achieving high resource utilization or high energy efficiency for heterogeneous workloads in cloud computing is an important topic [44], [57], [58], [27], [37]. Existing work on QoS management when co-locating heterogeneous workloads has two distinct categories: (i) reducing the probability of resource contention by either granting isolated execution environments to LRAs [49] [59] or adjusting task placement to reduce the resource contention on a certain node [60] [11], primarily for runtime QoS of LRA.…”
Section: Related Workmentioning
confidence: 99%
“…Alternatively, some works use data-driven approaches to make the GPU sharing decision. Horus [159,160] designs a prediction-based interference-aware mechanism that can be integrated with existing DL training scheduling frameworks. The prediction engine in Horus is in charge of estimating the GPU usage of each DL job by accessing its graph and dry running the model upon the job submission.…”
Section: Gpu Sharingmentioning
confidence: 99%