High-Performance Design of Hadoop RPC with RDMA over InfiniBand

Lu, Xiaoyi; Islam, Nusrat Sharmin; Wasi-ur-Rahman, Md.; Jose, Jithin; Subramoni, Hari; Wang, Hao; Panda, Dhabaleswar K.

doi:10.1109/icpp.2013.78

Cited by 102 publications

(36 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With these design changes, MapReduce job execution can be greatly accelerated by leveraging the benefits of high-performance interconnects. The high performance design of Hadoop (Hadoop-RDMA) [3] also shows significant performance benefits achievable through RDMA-capable interconnects using enhanced designs of various components (HDFS [6], MapReduce [12], RPC [9]) inside Hadoop. On the other hand, much performance modeling research [4,8,2,1,13,5,7,10,11] has been carried out to deeply analyze the default MapReduce framework.…”

Section: Motivationmentioning

confidence: 99%

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Wasi-ur-Rahman

Islam

et al. 2013

Proceedings of the 4th Annual Symposium on Cloud Computing

Self Cite

View full text Add to dashboard Cite

Recent studies [17,12] show that leveraging benefits of high performance interconnects like InfiniBand, MapReduce performance in terms of job execution time can be greatly enhanced by using additional features like in-memory merge, pipelined merge and reduce, and prefetching and caching of map outputs. In this paper, we validate that it is time to have a new performance model for the RDMA-based design of MapReduce over high performance interconnects. Our initial results derived from the proposed analytical model matches the experimental results within a 3-5% range. MotivationAuthors in [17,12] present enhanced designs and algorithms for the RDMA-based MapReduce framework. With these design changes, MapReduce job execution can be greatly accelerated by leveraging the benefits of high-performance interconnects. The high performance design of Hadoop (Hadoop-RDMA) [3] also shows significant performance benefits achievable through RDMA-capable interconnects using enhanced designs of various components (HDFS [6], MapReduce [12], RPC [9]) inside Hadoop. On the other hand, much performance modeling research [4, 8, 2, 1, 13, 5, 7, 10, 11] has been carried out to deeply analyze the default MapReduce framework. But, because of the inherent architectural changes, these models are not appropriate for performance prediction of RDMA-based enhanced MapReduce. For example, Table 1 captures the performance evaluation for the Sort benchmark using default Hadoop [16] and enhanced MapReduce with RDMA [12] and compares these with the performance model in [4]. This clearly illustrates the necessity of a new model for the enhanced design of MapReduce.Table 1: Comparison using Sort Our ApproachFor the RDMA-based enhanced design of MapReduce, all of the new features are added inside the ReduceTask. Thus, to predict the performance correctly for this design, we approach to model the performance of the ReduceTask from scratch. In the default MapReduce framework, execution time for a single ReduceTask, t RT is calculated from the execution times of different phases in the ReduceTask. t RT = t shu f f le + t merge + t reduce (1) For the RDMA-based design, on the other hand, t RT , will not be as simple as the default one. Because of the fully overlapping feature among these three phases, t RT can be rewritten as: t RT = max(t shu f f le ,t merge ) + α * t reduce (2) α represents the fraction of the total data that resides in memory yet to be reduced, while both shuffle and merge phases have completed their execution. Also, because of the architectural changes in the enhanced design, all of the parameters t shu f f le , t merge , and t reduce need to be re-modeled to incorporate all of the new design enhancements. Contribution0 200 400 600 800 1,000 1,200 128 64 32 16 8 Job Executiion Time (sec) Cluster Size Experimental ModelFigure 1: Model validation in Stampede Cluster We validate our model for enhanced MapReduce using terasort [15] on Stampede [14]. We vary the cluster size from 8 to 128, while increasing the data size exponentially from 4...

show abstract

Section: Motivationmentioning

confidence: 99%

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Wasi-ur-Rahman

Islam

et al. 2013

Proceedings of the 4th Annual Symposium on Cloud Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…The data shuffling phase of the MapReduce job is communication intensive, and can immensely benefit from these enhancements. Recent studies [25,26,33,35,46] have shed light on the possibilities of such performance gains for different Big Data middleware on HPC clusters, including MapReduce [46,47,57]. To evaluate its potential, we require benchmarks that can give us insights into the factors that affect MapReduce as an independent component.…”

Section: Introductionmentioning

confidence: 99%

“…Recent research[44][45][46][47] has revealed significant performance benefits can be obtained by leveraging the benefits of interconnects such as InfiniBand for faster execution of MapReduce jobs, along with enhanced features including RDMA-based shuffle, in-memory and pipelined merge during reduce, and pre-fetching and caching of map outputs. This RDMA-enhanced hybrid design known as HOMR[47] is publicly available as a part of the RDMA for Apache Hadoop project (RDMA-Apache-Hadoop-2.x)[19,23,25,33,44,47].In this section, we demonstrate the advantages of utilizing our stand-alone Hadoop MapReduce micro-benchmark suite by evaluating the HOMR against the default Hadoop MapReduce designs, with different storage architectures. We employ the RDMA for Apache Hadoop 2.x v0.9.7 package, which is based on Apache Hadoop 2.6.0., for our experiments.…”

mentioning

confidence: 98%

Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters

Shankar

Wasi-ur-Rahman

et al. 2016

J Supercomput

Self Cite

View full text Add to dashboard Cite

With the emergence of high-performance data analytics, the Hadoop platform is being increasingly used to process data stored on high-performance computing clusters. While there is immense scope for improving the performance of Hadoop MapReduce (including the network-intensive shuffle phase) over these modern clusters, that are equipped with high-speed interconnects such as InfiniBand and 10/40 GigE, and storage systems such as SSDs and Lustre, it is essential to study the MapReduce component in an isolated manner. In this paper, we study popular MapReduce workloads, obtained from well-accepted, comprehensive benchmark suites, to identify common shuffle data distribution patterns. We determine different environmental and workload-specific factors that affect the performance of the MapReduce job. Based on these characterization studies, we propose a microbenchmark suite that can be used to evaluate the performance of stand-alone Hadoop MapReduce, and demonstrate its ease-of-use with different networks/protocols, Hadoop distributions, and storage architectures. Performance evaluations with our proposed micro-benchmarks show that stand-alone Hadoop MapReduce over IPoIB performs better than 10 GigE by about 13-15 %, and the RDMA-enhanced hybrid MapReduce design can achieve up to 43 % performance improvement over default Hadoop MapReduce over IPoIB, in both shared-nothing and shared storage architectures.

show abstract

“…The trend of converging big data and high performance computing (HPC) is emerging [6][7][8][9][10] . As a specific example of this trend, DataMPI [11][12] is proposed, which aims at extending MPI by a key-value pair based communication operations to provide high performance communication in large-scale data computing scenario.…”

Section: Introductionmentioning

confidence: 99%

Accelerating Iterative Big Data Computing Through MPI

Fan

2015

J. Comput. Sci. Technol.

Self Cite

View full text Add to dashboard Cite

Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of computing, data movement, and data management plays a key role for current distributed data computing systems. In this paper, we first analyze the overhead of shuffle operation in Hadoop and Spark when running PageRank workload, and then propose an event-driven pipeline and in-memory shuffle design with better overlapping of computation and communication as DataMPIIteration, an MPI-based library, for iterative big data computing. Our performance evaluation shows DataMPI-Iteration can achieve 9X∼21X speedup over Apache Hadoop, and 2X∼3X speedup over Apache Spark for PageRank and K-means.

show abstract

High-Performance Design of Hadoop RPC with RDMA over InfiniBand

Cited by 102 publications

References 8 publications

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters

Accelerating Iterative Big Data Computing Through MPI

Contact Info

Product

Resources

About