This article addresses the scheduling of heterogeneous scientific workflows while minimizing the energy consumption of the cloud provider, by introducing a deadline sensitive algorithm. Scheduling in a cloud environment is a difficult optimization problem. Usually, work around the scheduling of scientific workflows focuses on public clouds where infrastructure management is an unknown black box. Thus, many works offer scheduling algorithms designed to select the best set of virtual machines over time, so that the cost to the end user is minimized. This article presents a new HEFTbased algorithm that takes into account users deadlines to minimize the number of machines used by the cloud provider. The results show the real benefits of using our algorithm for reducing the energy consumption of the cloud provider.
With the democratization of the Cloud paradigm, many applications are developed to be executed inside virtual machines hosted by remote data centers providing an Infrastructureas-a-Service (IaaS). These applications, developed by different users with different goals, tend to have different behaviors, hence a similar treatment on the Cloud provider side seems to be sub-optimal. Indeed, VM are black boxes to which are attached vCPUs, whose frequency are all the same, and are mainly indicative. In our opinion, an important limitation can be noted here. Because the Cloud provider is unaware of the applications that are executed inside the VMs, it has little insight on the behavior of the applications, and how to manage the VMs. For these reasons, Cloud provider can assign too much or too few resources to a VM, and might rely on migration mechanism to cope with that problem.In this paper, we propose to attach a virtual frequency to the VM template, which can be configured by the customer to better describe her expected application requirements, and the associated quality of service. Then, to enforce this virtual frequency, we designed a controller that leverages the Linux cgroup system to dynamically adjust the configuration on the host machine. We evaluate our new controller on a real infrastructure with real CPU-intensive applications executed by VM with different frequencies. We also discuss the benefits of our virtual frequency capping for VM placement.
This article tackles the problem of scheduling multiuser scientific workflows with unpredictable random arrivals and uncertain task execution times in a Cloud environment from the Cloud provider point of view. The solution consists in a deadline sensitive online algorithm, named NEARDEADLINE, that optimizes two metrics: the energy consumption and the fairness between users. Scheduling workflows in a private Cloud environment is a difficult optimization problem as capacity constraints must be fulfilled additionally to dependencies constraints between tasks of the workflows. Furthermore, NEARDEADLINE is built upon a new workflow execution platform. As far as we know no existing work tries to combine both energy consumption and fairness metrics in their optimization problem. The experiments conducted on a real infrastructure (clusters of Grid'5000) demonstrate that the NEARDEADLINE algorithm offers real benefits in reducing energy consumption, and enhancing user fairness.
The goal of a workflow engine is to facilitate the writing, the deploying, and the execution of a scientific workflow (i.e., graph of coarse-grain and heterogeneous tasks) on distributed infrastructures. With the democratization of the Cloud paradigm, many workflow engines of the state of the art offer a way to execute workflows on distant data centers by using the Infrastructure-as-a-Service (IaaS) or the Function-asa-Service (FaaS) services of Cloud providers. Hence, workflow engines can take advantage of the (presumably) infinite resources and the economical model of the Cloud. However, two important limitations lie in this vision of Cloud-oriented workflow engines. First, by using existing services of Cloud providers, and by managing the workflows at the user side, the Cloud providers are unaware of both the workflows and their user needs, and cannot apply specific resource optimizations to their infrastructure. Second, for the same reasons, handling the heterogeneity of tasks (different operating systems) in workflows necessarily degrades either the transparency for the users (who must provision different types of resources), or the completion time performance of the workflows, because of the stacking of virtualization layers. In this paper, we tackle these two limitations by presenting a new Cloud service dedicated to scientific workflows. Unlike existing workflow engines, this service is deployed and managed by the Cloud providers, and enables specific resource optimizations and offers a better control of the heterogeneity of the workflows. We evaluate our new service in comparison to Argo, a well-known workflow engine of the literature based on FaaS services. This evaluation was made on a real distributed experimental platform with a realistic and complex scenario.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.