Simplifying the task of resource management and scheduling for customers, while still delivering complex Quality-of-Service (QoS), is key to cloud computing. Many autoscaling policies have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application utilizing cloud elasticity features. However, in prior work, when a new policy is proposed, it is seldom compared to the state-of-the-art, and is often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as application model workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structure. We present a detailed comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the 7 policies, we conduct various forms of pairwise and group comparisons. We report both individual and aggregated metrics. Our results highlight the trade-offs between the suggested policies, and thus enable a better understanding of the current state-of-the-art.
Elasticity is one of the main features of cloud computing allowing customers to scale their resources based on the workload. Many autoscalers have been proposed in the past decade to decide on behalf of cloud customers when and how to provision resources to a cloud application based on the workload utilizing cloud elasticity features. However, in prior work, when a new policy is proposed, it is seldom compared to the state-of-the-art, and is often compared only to static provisioning using a predefined QoS target. This reduces the ability of cloud customers and of cloud operators to choose and deploy an autoscaling policy as there is seldom enough analysis on the performance of the autoscalers in different operating conditions and with different applications. In our work, we conduct an experimental performance evaluation of autoscaling policies, using as workflows, a commonly used formalism for automating resource management for applications with well-defined yet complex structures. We present a detailed comparative study of general state-of-the-art autoscaling policies, along with two new workflow-specific policies. To understand the performance differences between the seven policies, we conduct various experiments and compare their performance in both pairwise and group comparisons. We report both individual and aggregated metrics. As many workflows have deadline requirements on the tasks, we study the effect of autoscaling on workflow deadlines. Additionally, we look into the effect of autoscaling on the accounted and hourly-based charged costs, and evaluate performance variability caused by the autoscaler selection for each group of workflow sizes. Our results highlight the trade-offs between the suggested policies, how they can impact meeting the deadlines, and how they perform in different operating conditions, thus enabling a better understanding of the current state-of-the-art. CCS Concepts: •General and reference →Cross-computing tools and techniques; •Networks →Cloud computing; •Computer systems organization →Self-organizing autonomic computing; •Software and its engineering →Virtual machines; ACM Transactions on Modeling and Performance Evaluation of Computing Systems This is a significantly extended and more comprehensive version of our ICPE 2017 article [ 28].
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the makespan of single workflows with known task runtimes. The problem of scheduling multiple workflows has been addressed either in an offline fashion, or still with the assumption of known task runtimes. In this paper, we study the problem of scheduling workloads consisting of an arrival stream of workflows without task runtime estimates. The resource requirements of a workflow can significantly fluctuate during its execution. Thus, we present four scheduling policies for workloads of workflows with as their main feature the extent to which they reserve processors to workflows to deal with these fluctuations. We perform simulations with realistic synthetic workloads and we show that any form of processor reservation only decreases the overall system performance and that a greedy backfilling-like policy performs best.
Nowadays, in order to keep track of the fastchanging requirements of Internet applications, auto-scaling is used as an essential mechanism for adapting the number of provisioned resources to the resource demand. The straightforward approach is to deploy a set of common and opensource single-service auto-scalers for each service independently. However, this deployment leads to problems such as bottleneckshifting and increased oscillations. Existing auto-scalers that scale applications consisting of multiple services are kept closed-source. To face these challenges, we first survey existing auto-scalers and highlight current challenges. Then, we introduce Chamulteon, a redesign of our previously introduced mechanism, which can scale applications consisting of multiple services in a coordinated manner. We evaluate Chamulteon against four different wellcited auto-scalers in four sets of measurement-based experiments where we use diverse environments (VM vs. Docker), real-world traces, and vary the scale of the demanded resources. Overall, Chamulteon achieves the best auto-scaling performance based on established user-oriented and endorsed elasticity metrics.
Workflow schedulers often rely on task runtime estimates when making scheduling decisions, and they usually target the scheduling of a single workflow or batches of workflows. In contrast, in this paper, we evaluate the impact of the absence or limited accuracy of task runtime estimates on slowdown when scheduling complete workloads of workflows that arrive over time. We study a total of seven scheduling policies: four of these are popular existing policies for (batches of) workloads from the literature, including a simple backfilling policy which is not aware of task runtime estimates, two are novel workloadoriented policies, including one which targets fairness, and one is the well-known HEFT policy for a single workflow adapted to the online workload scenario. We simulate homogeneous and heterogeneous distributed systems to evaluate the performance of these policies under varying accuracy of task runtime estimates. Our results show that for high utilizations, the order in which workflows are processed is more important than the knowledge of correct task runtime estimates. Under low utilizations, all policies considered show good results, even a policy which does not use task runtime estimates. We also show that our Fair Workflow Prioritization (FWP) policy effectively decreases the variance of workflow slowdown and thus achieves fairness, and that the planbased scheduling policy derived from HEFT does not show much performance improvement while bringing extra complexity to the scheduling process.
Graphs are a natural fit for modeling concepts used in solving diverse problems in science, commerce, engineering, and governance. Responding to the diversity of graph data and algorithms, many parallel and distributed graph-processing systems exist. However, until now these platforms use a static model of deployment: they only run on a pre-defined set of machines. This raises many conceptual and pragmatic issues, including misfit with the highly dynamic nature of graph processing, and could lead to resource waste and high operational costs. In contrast, in this work we explore the benefits and drawbacks of the dynamic model of deployment. Building a threelayer benchmarking framework for assessing elasticity in graph analytics, we conduct an in-depth elasticity study of distributed graph processing. Our framework is composed of state-ofthe-art workloads, autoscalers, and metrics, derived from the LDBC Graphalytics benchmark and SPEC RG Cloud Group's elasticity metrics. We uncover the benefits and cost of elasticity in graph processing: while elasticity allows for fine-grained resource management, and does not degrade application performance, we find that graph workloads are sensitive to data migration while leasing or releasing resources. Moreover, we identify non-trivial interactions between scaling policies and graph workloads, which add an extra level of complexity to resource management and scheduling for graph processing.
Many fields of modern science require huge amounts of computation, and workflows are a very popular tool in e-Science since they allow to organize many small, simple tasks to solve big problems. They are used in astronomy, bioinformatics, machine learning, social network analysis, physics, and many other branches of science. Workflows are notoriously difficult to schedule, and the vast majority of research on workflow scheduling is concerned with scheduling single workflows with known runtimes. The goal of this PhD research is to bring more realism to the problem of workflow scheduling in actual systems. First, in real systems, multiple workflows may be contending for the available resources. Second, task runtime estimates are not always known, and task runtime estimates may be wrong. Third, workflows are usually not the only type of jobs submitted to a system, there may for instance also be parallel applications and bags-oftasks. Accordingly, the purpose of this PhD research is to create and analyze policies for online scheduling of workloads of workflows with and without known task runtimes that also contain jobs of other types. We are in the process of simulating policies, and we will validate our results by means of an implementation and real-world experiments with the KOALA-W workflow processing system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.