2014 IEEE 28th International Parallel and Distributed Processing Symposium 2014
DOI: 10.1109/ipdps.2014.90
|View full text |Cite
|
Sign up to set email alerts
|

DataMPI: Extending MPI to Hadoop-Like Big Data Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 49 publications
(32 citation statements)
references
References 10 publications
0
32
0
Order By: Relevance
“…[25] analysed more applications and highlighted areas where HPC and Apache Big Data Stack have good opportunities for integration on the base of [24]. DataMPI [26] tried to extend MPI to support Hadoop-like Big Data Computing jobs. It showed performance and flexibility benefits while maintaining high productivity, scalability, and fault tolerance of Hadoop.…”
Section: Discussionmentioning
confidence: 99%
“…[25] analysed more applications and highlighted areas where HPC and Apache Big Data Stack have good opportunities for integration on the base of [24]. DataMPI [26] tried to extend MPI to support Hadoop-like Big Data Computing jobs. It showed performance and flexibility benefits while maintaining high productivity, scalability, and fault tolerance of Hadoop.…”
Section: Discussionmentioning
confidence: 99%
“…DataMPI [11][12] is a key-value based communication library which extends MPI for big data applications. The design of DataMPI is based on the bipartite model, which defines the communication behavior between two build-in communicators as COMM BIPARTITLE O and COMM BIPARTITLE A.…”
Section: Overview Of Datampimentioning
confidence: 99%
“…The trend of converging big data and high performance computing (HPC) is emerging [6][7][8][9][10] . As a specific example of this trend, DataMPI [11][12] is proposed, which aims at extending MPI by a key-value pair based communication operations to provide high performance communication in large-scale data computing scenario. Considering different data structures, communication styles, and optimization methodologies in data computing, multiple programming paradigms are supported in DataMPI.…”
Section: Introductionmentioning
confidence: 99%
“…A comparison of architecture and abstractions between HPC and Apache Big Data Stacks (ABDS) is presented in [1] and the authors argued that a convergence between the two at many levels can be observed. While regular Hadoop uses the Java-based Netty 9 package for distributed communication, several works have proposed to use Message Passing Interface (MPI 10 ) libraries, which are typically C/C++ based, to achieve better performance, especially on HPC clusters with high-speed networks [9] [2]. A comprehensive assessment on the performance impact of highspeed interconnects (including 10Gbps Ethernet and Infiniband) on MapReduce is presented in [3].…”
Section: Background and Motivationmentioning
confidence: 99%
“…While Big Data software packages, such as Hadoop, were initially developed for inexpensive commodity workstations, as multi-core machines equipped with large memory capacities and hardware accelerators are becoming increasingly affordable, new Big Data systems that can take advantages of new hardware features and deliver high performance, such as Apache Spark 1 and Cloudera Impala 2 for in-memory and in-network processing, are becoming more preferable. As a result, there are growing interests on using High Performance Computing (HPC) facilities that are typically equipped with powerful processors (including accelerators) and high speed networks for Big Data applications [1] [2][3] [4]. Unfortunately, accesses to HPC facilities are very often restrictive and it is very difficult (if not impossible) to reconfigure HPC platforms for research purposes.…”
Section: Introductionmentioning
confidence: 99%