Hadoop on HPC: Integrating Hadoop and Pilot-Based Dynamic Resource Management

Luckow, André; Paraskevakos, Ioannis; Chantzialexiou, George; Jha, Shantenu

doi:10.1109/ipdpsw.2016.166

Cited by 17 publications

(11 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While some ensemble applications are data-flow oriented and thus amenable to be implemented with MapReduce, EnTK adopts a more flexible and coarse-grained notion of tasks, where a task in EnTK can support multiple programming models, including MPI. Further, EnTK does not assume a specific runtime system and, in conjunction with RP, can use Hadoop on HPC [21].…”

Section: Related Workmentioning

confidence: 99%

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Balasubramanian

Turilli

et al. 2018

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Self Cite

View full text Add to dashboard Cite

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of the two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10 4 ) tasks; and (iv) task-level fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.

show abstract

Section: Related Workmentioning

confidence: 99%

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Balasubramanian

Turilli

et al. 2018

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…TACC Wrangler has 24 Haswell hyper-threading enabled cores/node and 128 GB memory/node (120 nodes). Experiments were carried using RADICAL-Pilot and Pilot-Spark [19] extension, which allows to efficiently manage Spark on HPC resources through a common resource management API. We utilize a set of custom scripts to start the Dask cluster.…”

Section: Experiments and Discussionmentioning

confidence: 99%

Task-parallel Analysis of Molecular Dynamics Trajectories

Paraskevakos

Luckow

Khoshlessan

et al. 2018

Proceedings of the 47th International Conference on Parallel Processing

Self Cite

View full text Add to dashboard Cite

Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them to MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data. A typical large-scale MD simulation of a physical system of O(100k) atoms over µsecs can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and leaflet identification. We evaluate Spark, Dask and RADICAL-Pilot with respect to their abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding different frameworks that enable users to select the optimal system for each application. We also provide a quantitative performance analysis of the different algorithms across the three frameworks.

show abstract

“…That integration allows users to create Hadoop jobs with ease via a GUI but it also requires users to setup and configure a distributed Hadoop cluster on the HPC platform. Hadoop-on-HPC [2] integrates Hadoop with RADICAL-Pilot (RP): Hadoop extends RP with the MapReduce programming paradigm, and RP's pilot capabilities enable deployment of the Hadoop cluster and scheduling of the workflow's tasks on that cluster.…”

Section: Related Workmentioning

confidence: 99%

“…We integrated two existing, independently developed software systems and executed the workflows of two exemplar use cases on HPC resources. Based on our previous experience with integrating independent systems [1], [2], [11] and our building blocks approach [3], we describe an integration that minimizes changes in the existing code-bases while allowing users to benefit from the capabilities of both systems. We show how to approach the integration, evaluate diverse integration points and align the programming and execution models of the two systems.…”

Section: Introductionmentioning

confidence: 99%

RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms

Aymen¹,

Ward²,

Merzky³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Executing scientific workflows with heterogeneous tasks on HPC platforms poses several challenges which will be further exacerbated by the upcoming exascale platforms. At that scale, bespoke solutions will not enable effective and efficient workflow executions. In preparation, we need to look at ways to manage engineering effort and capability duplication across software systems by integrating independently developed, production-grade software solutions. In this paper, we integrate RADICAL-Pilot (RP) and Parsl and develop an MPI executor to enable the execution of workflows with heterogeneous (non)MPI Python functions at scale. We characterize the strong and weak scaling of the integrated RP-Parsl system when executing two use cases from polar science, and of the function executor on both SDSC Comet and TACC Frontera. We gain engineering insight about how to analyze and integrate workflow and runtime systems, minimizing changes in their code bases and overall development effort. Our experiments show that the overheads of the integrated system are invariant of resource and workflow scale, and measure the impact of diverse MPI overheads. Together, those results define a blueprint towards an ecosystem populated by specialized, efficient, effective and independently-maintained software systems to face the upcoming scaling challenges.

show abstract

Hadoop on HPC: Integrating Hadoop and Pilot-Based Dynamic Resource Management

Cited by 17 publications

References 23 publications

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Task-parallel Analysis of Molecular Dynamics Trajectories

RADICAL-Pilot and Parsl: Executing Heterogeneous Workflows on HPC Platforms

Contact Info

Product

Resources

About