Hadoop MapReduce is the community accepted platform that deals with the gigantic data in an efficient and cost-effective manner. To cope up with ever growing datasets and shrinking time to analyze them, Hadoop MapReduce leveraged parallelize computations on large distributed clusters consisting of many machines. Careful consideration of the factors affecting the Hadoop MapReduce can enhance its performance. Many researches has been done for improving the total job execution time of MapReduce by optimizing different parameters. The replication factor is still unexplored for its effect on the MapReduce job completion time. This paper focuses on the evaluation of data replication factor on MapReduce job completion time using regression analysis. The performance of the Hadoop MapReduce job in terms of total job completion time is monitored experimentally by changing different values of replication. The evaluation results evidently shows the dependence of the job completion time on the replication factor. The dependence of total job completion time on the replication has been verified both analytically and experimentally.
Vast sums of big data is a consequence of the data from different diversity. Conventional data computational frameworks and platforms are incapable to compute complex big data sets and process it at a fast pace. Cloud data centers having massive virtual and physical resources and computing platforms can provide support to big data processing. In addition, most well-known framework, MapReduce in conjunction with cloud data centers provide a fundamental support to scale up and speed up the big data classification, investigation and processing of the huge volumes, massive and complex big data sets. Inappropriate handling of cloud data center resources will not yield significant results which will eventually leads to the overall system's poor utilization. This research aims at analyzing and optimizing the number of compute nodes following MapReduce framework at computational resources in cloud data center by focusing upon the key issue of computational overhead due to inappropriate parameters selection and reducing overall execution time. The evaluation has been carried out experimentally by varying the number of compute nodes that is, map and reduce units. The results shows evidently that appropriate handling of compute nodes have a significant effect on the overall performance of the cloud data center in terms of total execution time.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.