Jonathan Bader scite author profile

Jonathan Bader

5Publications

83Citation Statements Received

104Citation Statements Given

How they've been cited

How they cite others

135

102

Affiliations

Technische Universität Berlin, Boehringer Ingelheim (United States), Australian National University

Publications

Order By: Most citations

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

Will¹,

Bader²,

Thamsen³

2020

View full text Add to dashboard Cite

Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources in both type and number can often be challenging, as the selected configuration needs to match a distributed dataflow job's resource demands and access patterns. A good cluster configuration avoids hardware bottlenecks and maximizes resource utilization, avoiding costly overprovisioning.We propose a collaborative approach for finding optimal cluster configurations based on sharing and learning from historical runtime data of distributed dataflow jobs. Collaboratively shared data can be utilized to predict runtimes of future job executions through the use of specialized regression models. However, training prediction models on historical runtime data that were produced by different users and in diverse contexts requires the models to take these contexts into account.

show abstract

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Will

Thamsen²,

Scheinert³

et al. 2021

View full text Add to dashboard Cite

show abstract

Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters

Bader

Lehmann

Thamsen

et al. 2022

View full text Add to dashboard Cite

Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure changes. In contrast, online methods, which predict task runtimes on specific nodes while the workflow is running, have to cope with the lack of example runs, especially during the start-up.In this paper, we present Lotaru, a novel online method for locally estimating task runtimes in scientific workflows on heterogeneous clusters. Lotaru first profiles all nodes of a cluster with a set of shortrunning and uniform microbenchmarks. Next, it runs the workflow to be scheduled on the user's local machine with drastically reduced data to determine important task characteristics. Based on these measurements, Lotaru learns a Bayesian linear regression model to predict a task's runtime given the input size and finally adjusts the predicted runtime specifically for each task-node pair in the cluster based on the micro-benchmark results. Due to its Bayesian approach, Lotaru can also compute robust uncertainty estimates and provides them as an input for advanced scheduling methods.Our evaluation with five real-world scientific workflows and different datasets shows that Lotaru significantly outperforms the baselines in terms of prediction errors for homogeneous and heterogeneous clusters. CCS CONCEPTS• Information systems → Information systems applications; • Computer systems organization → Distributed architectures; • Software and its engineering → Software architectures.

show abstract

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Bader

Thamsen

Kulagina

et al. 2021

View full text Add to dashboard Cite

Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However, these resource managers only consider the number of CPUs and the amount of available memory when assigning tasks to resources; they do not consider hardware differences beyond these numbers, while computational speed and memory access rates can differ significantly.We propose Tarema, a system for allocating task instances to heterogeneous cluster resources during the execution of scalable scientific workflows. First, Tarema profiles the available infrastructure with a set of benchmark programs and groups cluster nodes with similar performance. Second, Tarema uses online monitoring data of tasks, assigning labels to tasks depending on their resource usage. Third, Tarema uses the node groups and task labels to dynamically assign task instances evenly to resources based on resource demand. Our evaluation of a prototype implementation for Kubernetes, using five real-world Nextflow workflows from the popular nf-core framework and two 15-node clusters consisting of different virtual machines, shows a mean reduction of isolated job runtimes by 19.8% compared to popular schedulers in widely-used resource managers and 4.54% compared to the heuristic SJFN, while providing a better cluster usage. Moreover, executing two long-running workflows in parallel and on restricted resources shows that Tarema is able to reduce the runtimes even more while providing a fair cluster usage.

show abstract

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

Scheinert¹,

Alamgiralem²,

Bader³

et al. 2021

View full text Add to dashboard Cite

With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected.In this paper, we propose a collaborative approach for sharing anonymized workload execution traces among users, mining them for general patterns, and exploiting clusters of historical workloads for future optimizations. We evaluate our prototype implementation for mining workload execution graphs on a publicly available trace dataset and demonstrate the predictive value of workload clusters determined through traces only.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jonathan Bader

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

Contact Info

Product

Resources

About