“…This flexibility introduces some challenges that should be addressed to improve makespan and resource utilization when a batch of heterogeneous jobs are periodically submitted. Moreover, heterogeneous VMs in MapReduce virtual cluster accommodate a different number of containers. For instance, as shown in Figure A, consider two VMs ( VM 1 with <4,6>, and VM 2 with <2,4>) and MapReduce jobs ( J 1 , J 2 … J 6 ) in a batch as shown in Figure .…”
Section: Introductionmentioning
confidence: 99%
“…This flexibility introduces some challenges that should be addressed to improve makespan and resource utilization when a batch of heterogeneous jobs are periodically submitted. Moreover, heterogeneous VMs 16,17 in MapReduce virtual cluster accommodate a different number of containers.…”
Summary
Big data is largely influencing business entities and research sectors to be more data‐driven. Hadoop MapReduce is one of the cost‐effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on‐demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic‐based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2‐dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic‐based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
“…This flexibility introduces some challenges that should be addressed to improve makespan and resource utilization when a batch of heterogeneous jobs are periodically submitted. Moreover, heterogeneous VMs in MapReduce virtual cluster accommodate a different number of containers. For instance, as shown in Figure A, consider two VMs ( VM 1 with <4,6>, and VM 2 with <2,4>) and MapReduce jobs ( J 1 , J 2 … J 6 ) in a batch as shown in Figure .…”
Section: Introductionmentioning
confidence: 99%
“…This flexibility introduces some challenges that should be addressed to improve makespan and resource utilization when a batch of heterogeneous jobs are periodically submitted. Moreover, heterogeneous VMs 16,17 in MapReduce virtual cluster accommodate a different number of containers.…”
Summary
Big data is largely influencing business entities and research sectors to be more data‐driven. Hadoop MapReduce is one of the cost‐effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on‐demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic‐based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2‐dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic‐based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.