CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC

Monti, Henry M.; Butt, Ali R.; Vazhkudai, Sudharshan S.

doi:10.1109/ipdps.2011.118

Cited by 22 publications

(10 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ElasticSite [39] sent a part of the Grid workload to the Cloud when there is an overload of the user demand. CATCH [37] utilized the Cloud storage service for better data access between the desktop worker and the HPC center. Our work focuses on using the VC resources and only the missed deadline tasks are re-scheduled on the Cloud resources to improve the percentage of successful workflow.…”

Section: Related Workmentioning

confidence: 99%

“…CyberShake [27] that is used by the Southern California Earthquake Center to characterize the earthquake hazards in a region. Sipht [28] which is used to automate the search for untranslated RNAs (sRNAs) for the bacterial replicons in the NCBI database, and LIGO [37] that is used to generate and analyze gravitational waveforms from data collected during the coalescing of compact binary systems. These scientific workflows are generated with 30 tasks by Bharathi et al [5,28].…”

Section: Workload Modelmentioning

confidence: 99%

See 1 more Smart Citation

Cloud-aware data intensive workflow scheduling on volunteer computing systems

Ghafarian

Javadi

2015

Future Generation Computer Systems

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Workload Modelmentioning

confidence: 99%

Cloud-aware data intensive workflow scheduling on volunteer computing systems

Ghafarian

Javadi

2015

Future Generation Computer Systems

View full text Add to dashboard Cite

“…We have also begun developing a decentralized data delivery service for HPC applications. For our initial investigation, we have relied on both cloud [13] and non-cloud [12] resources, and PlanetLab [1] for investigating the effectiveness of our approach.…”

Section: Preliminary Workmentioning

confidence: 99%

“…Approach feasibility using Azure: We have begun the integration of cloud and HPC by developing a data transfer service CATCH [13]. CATCH uses Azure [10] for cloud storage and FUSE [4] to provide HPC jobs a transparent file system mount point to access the cloud resources.…”

Section: Preliminary Workmentioning

confidence: 99%

An Integrated Scratch Management Service for HPC Centers

Monti

2011

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum

Self Cite

View full text Add to dashboard Cite

To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast storage system, scratch space, which is designed for storing the data of currently running and soon-to-run HPC jobs. Instead, it is often used as a standard file system, wherein users arbitrarily store their data, without any consideration to the center's overall performance. To remedy this, centers periodically scan the scratch in an attempt to purge transient and stale data. This practice of supporting a cache workload using a file system and disjoint tools for staging and purging results in suboptimal use of the scratch space. This work addresses the above issues by proposing a new perspective, where the HPC scratch space is treated as a cache, and data population, retention, and eviction tools are integrated with scratch management. Using this approach, data is moved to the scratch space only when it is needed, and unneeded data is removed as soon as possible.

show abstract

“…In recent years, MapReduce/Hadoop [1] has emerged as the de facto model for big data applications, and is employed by industry [2], [3], [4], [5] and academia [6], [7], [8] alike. Improving the efficiency of Hadoop is therefore crucial.…”

Section: Introductionmentioning

confidence: 99%

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions

Wang

Khasymski

Krish

et al. 2013

2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems

Self Cite

View full text Add to dashboard Cite

Abstract-MapReduce is the model of choice for processing emerging big-data applications, and is facing an ever increasing demand for higher efficiency. In this context, we propose a novel task scheduling scheme that uses current task and system state information to drive online simulations concurrently within Hadoop, and predict with high accuracy future events, e.g., when a job would complete, or when task-specific datalocal nodes would be available. These predictions can then be used to make more efficient resource scheduling decisions. Our framework consists of two components: (i) Task Predictor that predicts task-level execution times based on historical data of the same type of tasks; and (ii) Job Simulator that instantiates the real task scheduler in a simulated environment, and predicts expected scheduling decisions for all the tasks comprising a MapReduce job. Evaluation shows that our framework can achieve high prediction accuracy -95% of the predicted task execution times are within 10% of the actual times -with negligible overhead (1.29%). Finally, we also present two realistic usecases, job data prefetching and a multistrategy dynamic scheduler, which can benefit from integration of our prediction framework in Hadoop.

show abstract

CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC

Cited by 22 publications

References 12 publications

Cloud-aware data intensive workflow scheduling on volunteer computing systems

Cloud-aware data intensive workflow scheduling on volunteer computing systems

An Integrated Scratch Management Service for HPC Centers

Towards Improving MapReduce Task Scheduling Using Online Simulation Based Predictions

Contact Info

Product

Resources

About