2020
DOI: 10.1007/978-3-030-48340-1_40
|View full text |Cite
|
Sign up to set email alerts
|

Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

Abstract: Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource utilization and throughput of clusters. However, job runtimes and the utilization of shared resources can vary signifi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…In the following we describe the method of grouping jobs, learning good co‐location of groups, and the evaluation of the scheduling decisions by Hugo. More details about the calculation of the scheduling probabilities as well as the prototype implementation can be found in the previous publication about Hugo 3 …”
Section: Cluster Scheduling Methods and Experimentsmentioning
confidence: 99%
See 2 more Smart Citations
“…In the following we describe the method of grouping jobs, learning good co‐location of groups, and the evaluation of the scheduling decisions by Hugo. More details about the calculation of the scheduling probabilities as well as the prototype implementation can be found in the previous publication about Hugo 3 …”
Section: Cluster Scheduling Methods and Experimentsmentioning
confidence: 99%
“…We acknowledge the co‐authors of our previous publications on this topic, 1‐3 especially Benjamin Rabier, Ilya Verbitskiy, and Florian Schmidt. This study was funded by German Ministry for Education and Research (BMBF) as BBDC (01IS14013A and 01IS18025A).…”
Section: Acknowledgementsmentioning
confidence: 99%
See 1 more Smart Citation
“…As mentioned in previous sections, ASA X is a stateful extension of [36], where the main difference is that ASA X can incorporate previous decisions in a RL approach. In [39], the authors combine offline job classification with online RL to improve collocations. This approach can accelerate convergence, although it might have complex consequences when new unknown jobs do not fit into the initial classification.…”
Section: Related Workmentioning
confidence: 99%
“…In cloud computing ecosystems, consolidating multiple user applications onto multi-core servers generates interference between co-hosted applications, which impacts application performance. To minimize interference effects and improve application performance, a common solution is to utilize schedulers that consider interference issues [26].…”
Section: Interference-aware Schedulingmentioning
confidence: 99%