The deployment of MapReduce in datacenters and clouds present several challenges in achieving good job performance. Compared to in-house dedicated clusters, datacenters and clouds often exhibit significant hardware and performance heterogeneity due to continuous server replacement and multitenant interferences. As most Mapreduce implementations assume homogeneous clusters, heterogeneity can cause significant load imbalance in task execution, leading to poor performance and low cluster utilizations. Despite existing optimizations on task scheduling and load balancing, MapReduce still performs poorly on heterogeneous clusters.In this paper, we find that the homogeneous configuration of tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should be customized with different settings to match the capabilities of heterogeneous nodes. To this end, we propose an adaptive task tuning approach, Ant, that automatically finds the optimal settings for individual tasks running on different nodes. Ant works best for large jobs with multiple rounds of map task execution. It first configures tasks with randomly selected configurations and gradually improves tasks settings by reproducing the settings from best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in local optimum, Ant uses genetic functions during task configuration. Experimental results on a heterogeneous cluster and a virtual cluster with varying hardware capabilities show that Ant improves the average job completion time by 23%, 11%, and 16% compared to stock Hadoop, customized Hadoop with industry recommendations, and a profiling-based configuration approach, respectively.
As Hadoop is becoming increasingly popular in large-scale data analysis, there is a growing need for providing predictable services to users who have strict requirements on job completion times. While earliest deadline first scheduling (EDF) like algorithms are popular in guaranteeing job deadlines in real-time systems, they are not effective in a dynamic Hadoop environment, i.e., a Hadoop cluster with dynamically available resources. As there is a growing number of Hadoop clusters deployed on hybrid systems, e.g., infrastructure powered by mix of traditional and renewable energy, and cloud platforms hosting heterogeneous workloads, variable resource availability becomes common when running Hadoop jobs. In this paper, we propose, RDS, a Resource and Deadline-aware Hadoop job Scheduler that takes future resource availability into consideration when minimizing job deadline misses. We formulate the job scheduling problem as an online optimization problem and solve it using an efficient receding horizon control algorithm. To aid the control, we design a self-learning model to estimate job completion times and use a simple but effective model to predict future resource availability. We have implemented RDS in the opensource Hadoop implementation and performed evaluations with various benchmark workloads. Experimental results show that RDS substantially reduces the penalty of deadline misses by at least 36% and 10% compared with Fair Scheduler and EDF scheduler, respectively.
Abstract-While major cloud service operators have taken various initiatives to operate their sustainable datacenters with green energy, it is challenging to effectively utilize the green energy since its generation depends on dynamic natural conditions. Fortunately, the geographical distribution of datacenters provides an opportunity for optimizing the system performance by distributing cloud workloads. In this paper, we propose a holistic heterogeneity-aware cloud workload placement and migration approach, sCloud, that aims to maximize the system goodput in distributed self-sustainable datacenters. sCloud adaptively places the transactional workload to distributed datacenters, allocates the available resource to heterogeneous workloads in each datacenter, and migrates batch jobs across datacenters, while taking into account the green power availability and QoS requirements. We formulate the transactional workload placement as a constrained optimization problem that can be solved by nonlinear programming. Then, we propose a batch job migration algorithm to further improve the system goodput when the green power supply varies widely at different locations. We have implemented sCloud in a university cloud testbed with real-world weather conditions and workload traces. Experimental results demonstrate sCloud can achieve near-to-optimal system performance while being resilient to dynamic power availability. It outperforms a heterogeneity-oblivious approach by 26% in improving system goodput and 29% in reducing QoS violations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.