Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Li, Min; Vazhkudai, Sudharshan S.; Butt, Ali R.; Meng, Fanchao; Ma, Xiaosong; Kim, Young-Jae; Engelmann, Christian; Shipman, Galen

doi:10.1109/sc.2010.28

Cited by 57 publications

(30 citation statements)

References 35 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of dedicated I/O cores [5], [26]- [28], threads [29], [30], or dedicated nodes [31], [32] (also termed "staging areas") is becoming more and more common. These strategies overlap I/O with computation by shipping data to dedicated resources, and offer more liberty in delaying actual I/O accesses.…”

Section: B Application-side I/o Schedulingmentioning

confidence: 99%

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

Dorier

Antoniu

Ross

et al. 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-Unmatched computation and storage performance in new HPC systems have led to a plethora of I/O optimizations ranging from application-side collective I/O to network and disk-level request scheduling on the file system side. As we deal with ever larger machines, the interference produced by multiple applications accessing a shared parallel file system in a concurrent manner becomes a major problem. Interference often breaks single-application I/O optimizations, dramatically degrading application I/O performance and, as a result, lowering machine wide efficiency.This paper focuses on CALCioM, a framework that aims to mitigate I/O interference through the dynamic selection of appropriate scheduling policies. CALCioM allows several applications running on a supercomputer to communicate and coordinate their I/O strategy in order to avoid interfering with one another. In this work, we examine four I/O strategies that can be accommodated in this framework: serializing, interrupting, interfering and coordinating. Experiments on Argonne's BG/P Surveyor machine and on several clusters of the French Grid'5000 show how CALCioM can be used to efficiently and transparently improve the scheduling strategy between two otherwise interfering applications, given specified metrics of machine wide efficiency.

show abstract

Section: B Application-side I/o Schedulingmentioning

confidence: 99%

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

Dorier

Antoniu

Ross

et al. 2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

show abstract

“…Space-partitioning using dedicated cores to handle I/O or visualization tasks has been proposed using a FUSE interface [16] or an active buffering scheme for collective I/O [20]. The use of a FUSE interface produces multiple copies of data passing through the kernel space, increasing memory usage.…”

Section: Tightly-coupled Isv: Challenges and Solutionsmentioning

confidence: 99%

Damaris/Viz: A nonintrusive, adaptable and user-friendly in situ visualization framework

Dorier

Sisneros

Peterka

et al. 2013

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

View full text Add to dashboard Cite

Reducing the amount of data stored by simulations will be of utmost importance for the next generation of large-scale computing. Accordingly, there is active research to shift analysis and visualization tasks to run in situ, that is, closer to the simulation via the sharing of some resources. This is beneficial as it can avoid the necessity of storing large amounts of data for post-processing. In this paper, we focus on the specific case of in situ visualization where analysis codes are collocated with the simulation's code and run on the same resources. It is important for an in situ technique to require minimum modifications to existing codes, be adaptable, and have a low impact on both run times and resource usage. We accomplish this through the Damaris/Viz framework, which provides in situ visualization support to the Damaris I/O middleware. The use of Damaris as a bridge to existing visualization packages allows us to (1) reduce code moditication to a minimum for existing simulations, (2) gather capabilities of several visualization tools to offer a unified data management interface, (3) use dedicated cores to hide the run time impact of in situ visualization and (4) efficiently use memory through a shared-memory-based communication model. Experiments are conducted on Blue Waters and Grid'5000 to visualize the CM1 atmospheric simulation and the Nek5000 CFD solver.

show abstract

“…In our design, the client and center stubs talk to a transparent file system mount point, provided through FUSE [8] as Cloud FS (Figure 1), which abstracts the process of accessing the cloud storage and in addition moves the data closer to the end-user or the HPC center. The use of FUSE to abstract access to different storage substrates has gained wide spread popularity due to the ease with which purpose-built storage systems can be transparently made available by having them implement certain POSIX APIs (e.g., s3fs [15] for Amazon S3 or stdchk [16], [17], a file system atop distributed storage of disks, memory or SSD.) The read() or write() call in these situations typically abstracts parallel striping or a network transfer, respectively.…”

Section: Data Transport As a File Systemmentioning

confidence: 99%

CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC

Monti

Butt

Vazhkudai

2011

2011 IEEE International Parallel &Amp; Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

Abstract-Modern High Performance Computing (HPC)applications process very large amounts of data. A critical research challenge lies in transporting input data to the HPC center from a number of distributed sources, e.g., scientific experiments and web repositories, etc., and offloading the result data to geographically distributed, intermittently available endusers, often over under-provisioned connections. Such enduser data services are typically performed using point-to-point transfers that are designed for well-endowed sites and are unable to reconcile the center's resource usage and users' delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. To overcome these inefficiencies, decentralized HPC data services are emerging as viable alternatives. In this paper, we develop and enhance such distributed data services by designing CATCH, a Cloud-based Adaptive data Transfer serviCe for HPC. CATCH leverages a bevy of cloud storage resources to orchestrate a decentralized data transport with fail-over capabilities. Our results demonstrate that CATCH is a feasible approach, and can help improve the data transfer times at the HPC center by as much as 81.1% for typical HPC workloads.

show abstract

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Cited by 57 publications

References 35 publications

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination

Damaris/Viz: A nonintrusive, adaptable and user-friendly in situ visualization framework

CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC

Contact Info

Product

Resources

About