large shared-memory multi-core configurations [16,28,35]. For example, OpenMP, the most popular approach for shared memory programming, has significantly evolved and currently incorporates advanced features such as tasking support [4,39]. For all these reasons, parallel operations such as scheduling and synchronization are expected to become key system software components. As a result, simulators targeting nextgeneration HPC systems must take into account such parallel operations performed at the runtime system level.Existing tools make simulation of large-scale HPC machines with thousands of cores unfeasible. Conventional cycleaccurate architectural simulators offer a great level of detail, but make simulation times impractical when using more than a few tens [6,7,51] or a few hundreds of cores [45]. Higherlevel simulators are able to simulate thousands of cores at the cost of not modelling any microarchitectural details or the impact of the system software [2,14,55]. Raising the level of abstraction is necessary, but needs to be done to an appropriate degree. Hence, it is critical to develop flexible simulation infrastructures that allow to quickly trim the vast design space while still capturing the impact of the simulated microarchitecture and system software.In this paper we make the following contributions:• We present MUSA, a multi-scale simulation approach that enables fast and accurate performance estimations of next-generation HPC machines. Our methodology seamlessly captures inter-node communication as well as intranode microarchitectural and system software interactions, improving usability and simplifying the simulation workflow. MUSA relies on native execution traces with two levels of detail to allow simulation of different communication networks, numbers of cores per node, and relevant microarchitectural parameters. • We validate MUSA using the NAS Multi-Zone ParallelBenchmark suite [27], and then evaluate three large-scale case studies (with up to 16,384 cores) using BT-MZ, HYDRO [33], and SPECFEM3D [31]. Our evaluation shows that MUSA provides accurate performance predictions by combining information at different levels of granularity. When comparing native executions and MUSA simulations with up to 2,048 cores, we achieve relative errors within 10% in the common case, demonstrating that our detailed model is able to capture microarchitectural and system software effects. In addition, we show that our simulations complete in an affordable amount of Abstract-The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically not transparent to scientific p rogrammers a nd s ystem a rchitects. Therefore, predicting the behavior of current scientific applications on future HPC infrastructures is a challenging task.In this paper we present MUSA, an end-to-end methodology that employs a multi-level simulation infrastructure. By combining different lev...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.