Uintah is a computational framework for fluidstructure interaction problems using a combination of the ICE fluid flow algorithm, adaptive mesh refinement (AMR) and MPM particle methods. Uintah uses domain decomposition with a taskgraph approach for asynchronous communication and automatic message generation. The Uintah software has been used for a decade with its original task scheduler that ran computational tasks in a predefined static order. In order to improve the performance of Uintah for petascale architecture, a new dynamic task scheduler allowing better overlapping of the communication and computation is designed and evaluated in this study. The new scheduler supports asynchronous, out-of-order scheduling of computational tasks by putting them in a distributed directed acyclic graph (DAG) and by isolating task memory and keeping multiple copies of task variables in a data warehouse when necessary. A new runtime system has been implemented with a two-stage priority queuing architecture to improve the scheduling efficiency. The effectiveness of this new approach is shown through an analysis of the performance of the software on large scale fluid-structure examples.
Abstract-Modern scientific discovery is driven by an insatiable demand for computing performance. The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops by the year 2020 and power consumption is the primary obstacle to achieving this goal. A combination of architectural improvements, circuit design, and manufacturing technologies must provide over a 20× improvement in energy efficiency. In this paper, we present some of the progress NVIDIA Research is making toward the design of Exascale systems by tailoring features to address the scaling challenges of performance and energy efficiency. We evaluate several architectural concepts for a set of HPC applications demonstrating expected energy efficiency improvements resulting from circuit and packaging innovations such as low-voltage SRAM, low-energy signaling, and on-package memory. Finally, we discuss the scaling of these features with respect to future process technologies and provide power and performance projections for our Exascale research architecture.
SUMMARYBlock-structured adaptive mesh refinement (BSAMR) is widely used within simulation software because it improves the utilization of computing resources by refining the mesh only where necessary. For BSAMR to scale onto existing petascale and eventually exascale computers all portions of the simulation need to weak scale ideally. Any portions of the simulation that do not will become a bottleneck at larger numbers of cores. The challenge is to design algorithms that will make it possible to avoid these bottlenecks on exascale computers. One step of existing BSAMR algorithms involves determining where to create new patches of refinement. The Berger-Rigoutsos algorithm is commonly used to perform this task. This paper provides a detailed analysis of the performance of two existing parallel implementations of the BergerRigoutsos algorithm and develops a new parallel implementation of the Berger-Rigoutsos algorithm and a tiled algorithm that exhibits ideal scalability. The analysis and computational results up to 98 304 cores are used to design performance models which are then used to predict how these algorithms will perform on 100 M cores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.