2021 IEEE International Conference on Big Data (Big Data) 2021
DOI: 10.1109/bigdata52589.2021.9671275
|View full text |Cite
|
Sign up to set email alerts
|

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

Abstract: With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 24 publications
0
8
0
Order By: Relevance
“…In several of our prior works [16,31,33,32,20], we discussed the idea of exploiting similarities between different jobs and their executions, cultivating runtime data in a collaborative manner among numerous users and thereby improving the prediction capabilities of individual users. This includes decentralized system architectures for sharing context-aware runtime metrics, as well as similarity matching between jobs.…”
Section: Results Overviewmentioning
confidence: 99%
“…In several of our prior works [16,31,33,32,20], we discussed the idea of exploiting similarities between different jobs and their executions, cultivating runtime data in a collaborative manner among numerous users and thereby improving the prediction capabilities of individual users. This includes decentralized system architectures for sharing context-aware runtime metrics, as well as similarity matching between jobs.…”
Section: Results Overviewmentioning
confidence: 99%
“…Both approaches have in common that they need at least comprehensive knowledge about execution times of all tasks on all available nodes. However, these values are not available in advance but must be determined either by asking users for estimates [18,22,23], by analyzing historical traces [35,36,42], or by using some form of online learning [43,45]. Lotaru aims to estimate the runtime for all task-node pairs in a cluster to enable the use of existing scheduling methods in real-world systems.…”
Section: Scheduling Workflow Tasks Onto Heterogeneous Clustersmentioning
confidence: 99%
“…Some approaches use runtime data to predict the job's scale-out and runtime behavior. This data is gained either from dedicated profiling or previous full executions [7], [25]- [31]. The models can then be used to predict the execution performance for different cluster configurations, and the most resource-efficient one will be chosen.…”
Section: A Approaches Based On Historical Performance Datamentioning
confidence: 99%