2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.166
|View full text |Cite
|
Sign up to set email alerts
|

Hadoop on HPC: Integrating Hadoop and Pilot-Based Dynamic Resource Management

Abstract: Abstract-High-performance computing platforms such as "supercomputers" have traditionally been designed to meet the compute demands of scientific applications. Consequently, they have been architected as net producers and not consumers of data. The Apache Hadoop ecosystem has evolved to meet the requirements of data processing applications and has addressed many of the traditional limitations of HPC platforms. There exist a class of scientific applications however, that need the collective capabilities of trad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…While some ensemble applications are data-flow oriented and thus amenable to be implemented with MapReduce, EnTK adopts a more flexible and coarse-grained notion of tasks, where a task in EnTK can support multiple programming models, including MPI. Further, EnTK does not assume a specific runtime system and, in conjunction with RP, can use Hadoop on HPC [21].…”
Section: Related Workmentioning
confidence: 99%
“…While some ensemble applications are data-flow oriented and thus amenable to be implemented with MapReduce, EnTK adopts a more flexible and coarse-grained notion of tasks, where a task in EnTK can support multiple programming models, including MPI. Further, EnTK does not assume a specific runtime system and, in conjunction with RP, can use Hadoop on HPC [21].…”
Section: Related Workmentioning
confidence: 99%
“…TACC Wrangler has 24 Haswell hyper-threading enabled cores/node and 128 GB memory/node (120 nodes). Experiments were carried using RADICAL-Pilot and Pilot-Spark [19] extension, which allows to efficiently manage Spark on HPC resources through a common resource management API. We utilize a set of custom scripts to start the Dask cluster.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…That integration allows users to create Hadoop jobs with ease via a GUI but it also requires users to setup and configure a distributed Hadoop cluster on the HPC platform. Hadoop-on-HPC [2] integrates Hadoop with RADICAL-Pilot (RP): Hadoop extends RP with the MapReduce programming paradigm, and RP's pilot capabilities enable deployment of the Hadoop cluster and scheduling of the workflow's tasks on that cluster.…”
Section: Related Workmentioning
confidence: 99%
“…We integrated two existing, independently developed software systems and executed the workflows of two exemplar use cases on HPC resources. Based on our previous experience with integrating independent systems [1], [2], [11] and our building blocks approach [3], we describe an integration that minimizes changes in the existing code-bases while allowing users to benefit from the capabilities of both systems. We show how to approach the integration, evaluate diverse integration points and align the programming and execution models of the two systems.…”
Section: Introductionmentioning
confidence: 99%