2020
DOI: 10.1007/978-3-030-60239-0_33
|View full text |Cite
|
Sign up to set email alerts
|

Horus: An Interference-Aware Resource Manager for Deep Learning Systems

Abstract: Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems-ranging from a singular GPU device to machine clusters-require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location-multiple jobs co-located within the same GPU-is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 26 publications
0
5
0
Order By: Relevance
“…However, these studies highlight that concurrent jobs can potentially interfere with each other, adversely affecting training performance. Furthermore, the extent of interference depends on the DL models themselves [19,20]. In the pursuit of identifying suitable job combinations, Gandiva employs a trial-and-error approach, while Gavel establishes a threshold for the difference between isolated training and packing decisions.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, these studies highlight that concurrent jobs can potentially interfere with each other, adversely affecting training performance. Furthermore, the extent of interference depends on the DL models themselves [19,20]. In the pursuit of identifying suitable job combinations, Gandiva employs a trial-and-error approach, while Gavel establishes a threshold for the difference between isolated training and packing decisions.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Inference Serving Systems Most modern model serving systems (e.g., Clipper [9], Amazon Sagemaker, Microsoft AzureML, INFaaS [36], Horus [40], Perseus [29]) treat ML inference as a black box. These approaches must train and manage many models to meet diverse SLOs under varying query loads.…”
Section: Related Workmentioning
confidence: 99%
“…Suitable model-variants today may fail to satisfy SLOs in the future when combined with new compute infrastructure or deployed in a new execution environment [3]. Co-location interference: Fourth, inference models are typically co-located on worker machines to improve resource utilization and reduce operating costs [29,33,36,40]. Unfortunately, model co-location introduces the opportunity for model interference, which can degrade inference latency and cause SLO violations.…”
Section: Introductionmentioning
confidence: 99%
“…Concurrent Execution of Co-Located DL Workloads: The concurrent execution of co-located DL workloads leads to workload interference through resource contention, bandwidth bottleneck, race conditions, etc [36]. Different workloads can be scheduled on dedicated GPUs to provide isolation to training processes.…”
Section: Introductionmentioning
confidence: 99%