A comparative analysis of iterative MapReduce systems

Kang, Minseo; Lee, Jae-Gil

doi:10.1145/3007818.3007819

Cited by 4 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Kang and Lee [47] examined five resource management frameworks including Apache Hadoop and Spark with respect to performance overheads (disk input/output, network communication, scheduling, etc.) in supporting iterative computation.…”

Section: Machine Learning and Iterative Tasks Supportmentioning

confidence: 99%

Big Data in Cloud Computing: A Resource Management Perspective

Ullah

Awan

Khiyal

2018

Scientific Programming

View full text Add to dashboard Cite

The modern day advancement is increasingly digitizing our lives which has led to a rapid growth of data. Such multidimensional datasets are precious due to the potential of unearthing new knowledge and developing decision-making insights from them. Analyzing this huge amount of data from multiple sources can help organizations to plan for the future and anticipate changing market trends and customer requirements. While the Hadoop framework is a popular platform for processing larger datasets, there are a number of other computing infrastructures, available to use in various application domains. The primary focus of the study is how to classify major big data resource management systems in the context of cloud computing environment. We identify some key features which characterize big data frameworks as well as their associated challenges and issues. We use various evaluation metrics from different aspects to identify usage scenarios of these platforms. The study came up with some interesting findings which are in contradiction with the available literature on the Internet.

show abstract

Section: Machine Learning and Iterative Tasks Supportmentioning

confidence: 99%

Big Data in Cloud Computing: A Resource Management Perspective

Ullah

Awan

Khiyal

2018

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Multiple Spark jobs initiated by different threads may run concurrently within each Spark application which gets its own executor processes. Spark runs long-running processes and threads, which stay up through the entire duration of the application and execute tasks in multiple threads, to avoid the overhead of repeatedly invoking tasks [9,10]. Allocation of executor resources on the cluster can be controlled by Spark YARN client using the --num-executors option, which overrides Spark's built-in DRA mechanism [18].…”

Section: Spark Architecture and Resilient Distributed Dataset (Rdd)mentioning

confidence: 99%

Best Trade-Off Point Method for Efficient Resource Provisioning in Spark

Nghiem

2018

Algorithms

View full text Add to dashboard Cite

Considering the recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for more energy-efficient computing. We previously proposed the Best Trade-off Point (BToP) method, which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce. The BToP method is expected to work for any application or system which relies on a trade-off elbow curve, non-inverted or inverted, for making good decisions. In this paper, we apply the BToP method to the emerging cluster computing framework, Apache Spark, and show that its performance and energy consumption are better than Spark with its built-in dynamic resource allocation enabled. Our Spark-Bench tests confirm the effectiveness of using the BToP method with Spark to determine the optimal number of executors for any workload in production environments where job profiling for behavioral replication will lead to the most efficient resource provisioning.

show abstract

An experimental analysis of limitations of MapReduce for iterative algorithms on Spark

Kang

Lee

2017

Cluster Comput

View full text Add to dashboard Cite

A comparative analysis of iterative MapReduce systems

Cited by 4 publications

References 5 publications

Big Data in Cloud Computing: A Resource Management Perspective

Big Data in Cloud Computing: A Resource Management Perspective

Best Trade-Off Point Method for Efficient Resource Provisioning in Spark

An experimental analysis of limitations of MapReduce for iterative algorithms on Spark

Contact Info

Product

Resources

About