As supercomputers grow, understanding their behavior and performance has become increasingly challenging. New hurdles in scalability, programmability, power consumption, reliability, cost, and cooling are emerging, along with new technologies such as 3D integration, GP-GPUs, silicon-photonics, and other "game changers". Currently, they HPC community lacks a unified toolset to evaluate these technologies and design for these challenges.To address this problem, a number of institutions have joined together to create the Structural Simulation Toolkit (SST), an open, modular, parallel, multi-criteria, multi-scale simulation framework. The SST includes a number of processor, memory, and network models. The SST has been used in a variety of network, memory, and application studies and aims to become the standard simulation framework for designing and procuring HPC systems.
SUMMARYApplications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era.
Today's high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one hundred thousand processors. For systems of such scale, efficient I/O is a significant challenge that cannot be solved using traditional approaches. In particular, general purpose parallel file systems that limit applications to standard interfaces and access policies do not scale and will likely be a performance bottleneck for many scientific applications.In this paper, we investigate the use of a "lightweight" approach to I/O that requires the application or I/O-library developer to extend a core set of critical I/O functionality with the minimum set of features and services required by its target applications. We argue that this approach allows the development of I/O libraries that are both scalable and secure. We support our claims with preliminary results for a lightweight checkpoint operation on a development cluster at Sandia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.