2021
DOI: 10.3390/electronics10030350
|View full text |Cite
|
Sign up to set email alerts
|

A GPU Scheduling Framework to Accelerate Hyper-Parameter Optimization in Deep Learning Clusters

Abstract: This paper proposes Hermes, a container-based preemptive GPU scheduling framework for accelerating hyper-parameter optimization in deep learning (DL) clusters. Hermes accelerates hyper-parameter optimization by time-sharing between DL jobs and prioritizing jobs with more promising hyper-parameter combinations. Hermes’s scheduling policy is grounded on the observation that good hyper-parameter combinations converge quickly in the early phases of training. By giving higher priority to fast-converging containers,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…To Speed up Hyper‐Boundary Enhancement in Deep Learning Bunches Jaewon Son et al 18 have expected groundwork research for quickening the optimization of hyperparameters in DL clusters. This paper's main goal was to quickly combine good hyperparameters during the initial training phase.…”
Section: Literature Surveymentioning
confidence: 99%
“…To Speed up Hyper‐Boundary Enhancement in Deep Learning Bunches Jaewon Son et al 18 have expected groundwork research for quickening the optimization of hyperparameters in DL clusters. This paper's main goal was to quickly combine good hyperparameters during the initial training phase.…”
Section: Literature Surveymentioning
confidence: 99%
“…Accuracy efficiency. Hermes [131] is a scheduler to expedite HPO workloads in GPU datacenters. It provides a container preemption mechanism to enable migration between DL jobs with minimal overhead.…”
Section: Hyperparameter Optimization Workloadsmentioning
confidence: 99%
“…Singh and Chana 20 worked on an extensive survey of resource scheduling on cloud systems by covering several aspects. While some schedulers deal with only specific resources like GPU, [21][22][23] others consider many resources like CPU, GPU, memory, and more. There are several resource schedulers for cloud environments and big data systems such as; Kubernetes, 15 Mesos, 18 and Yarn.…”
Section: Related Workmentioning
confidence: 99%