JUMMP: Job Uninterrupted Maneuverable MapReduce Platform

Moody, William Clay; Ngo, Linh; Duffy, Edward B.; Apon, Amy

doi:10.1109/cluster.2013.6702650

Cited by 11 publications

(6 citation statements)

References 5 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hadoop and its ecosystem have been ported to HPC systems [18], [19], [20], enabling the use of the MapReduce programming model. While some ensemble applications are data-flow oriented and thus amenable to be implemented with MapReduce, EnTK adopts a more flexible and coarse-grained notion of tasks, where a task in EnTK can support multiple programming models, including MPI.…”

Section: Related Workmentioning

confidence: 99%

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Balasubramanian

Turilli

et al. 2018

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of the two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10 4 ) tasks; and (iv) task-level fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.

show abstract

Section: Related Workmentioning

confidence: 99%

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Balasubramanian

Turilli

et al. 2018

2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

View full text Add to dashboard Cite

show abstract

“…The individual nodes can join or leave the cluster with minimal operational overhead, but otherwise ensure the survivability of the system. One such example of a systems that survives in this manner is JUMMP [21].…”

Section: The Defensive Maneuver Cyber Platform Modelmentioning

confidence: 99%

Defensive Maneuver Cyber Platform Modeling with Stochastic Petri Nets

Moody¹,

Hu²,

Apon³

2014

Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Self Cite

View full text Add to dashboard Cite

Distributed and parallel applications are critical information technology systems in multiple industries, including academia, military, government, financial, medical, and transportation. These applications present target rich environments for malicious attackers seeking to disrupt the confidentiality, integrity and availability of these systems. Applying the military concept of defense cyber maneuver to these systems can provide protection and defense mechanisms that allow survivability and operational continuity. Understanding the tradeoffs between information systems security and operational performance when applying maneuver principles is of interest to administrators, users, and researchers. To this end, we present a model of a defensive maneuver cyber platform using Stochastic Petri Nets. This model enables the understanding and evaluation of the costs and benefits of maneuverability in a distributed application environment, specifically focusing on moving target defense and deceptive defense strategies.

show abstract

“…To achieve interoperability, several frameworks explore the usage of Hadoop on HPC resources. Various frameworks for running Hadoop on HPC emerged, e. g., Hadoop on Demand [27], JUMMP [28], MagPie [29], MyHadoop [30], My-Cray [31]. While these frameworks can spawn and manage Hadoop clusters many challenges with respect to optimizing configurations and resource usage including the use of available SSDs for the shuffle phase, of parallel filesystems and of high-end network features, e. g. RDMA [32] remain.…”

Section: Background and Related Workmentioning

confidence: 99%

Hadoop on HPC: Integrating Hadoop and Pilot-Based Dynamic Resource Management

Luckow

Paraskevakos

Chantzialexiou

et al. 2016

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

Abstract-High-performance computing platforms such as "supercomputers" have traditionally been designed to meet the compute demands of scientific applications. Consequently, they have been architected as net producers and not consumers of data. The Apache Hadoop ecosystem has evolved to meet the requirements of data processing applications and has addressed many of the traditional limitations of HPC platforms. There exist a class of scientific applications however, that need the collective capabilities of traditional high-performance computing environments and the Apache Hadoop ecosystem. For example, the scientific domains of bio-molecular dynamics, genomics and network science need to couple traditional computing with Hadoop/Spark based analysis. We investigate the critical question of how to present the capabilities of both computing environments to such scientific applications. Whereas this questions needs answers at multiple levels, we focus on the design of resource management middleware that might support the needs of both. We propose extensions to the Pilot-Abstraction so as to provide a unifying resource management layer. This is an important step towards interoperable use of HPC and Hadoop/Spark. It also allows applications to integrate HPC stages (e. g. simulations) to data analytics. Many supercomputing centers have started to officially support Hadoop environments, either in a dedicated environment or in hybrid deployments using tools such as myHadoop. This typically involves many intrinsic, environment-specific details that need to be mastered, and often swamp conceptual issues like: How best to couple HPC and Hadoop application stages? How to explore runtime trade-offs (data localities vs. data movement)? This paper provides both conceptual understanding and practical solutions to the integrated use of HPC and Hadoop environments. Our experiments are performed on state-of-the-art production HPC environments and provide middleware for multiple domain sciences.

show abstract

JUMMP: Job Uninterrupted Maneuverable MapReduce Platform

Cited by 11 publications

References 5 publications

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Defensive Maneuver Cyber Platform Modeling with Stochastic Petri Nets

Hadoop on HPC: Integrating Hadoop and Pilot-Based Dynamic Resource Management

Contact Info

Product

Resources

About