For complex engineering and scientific applications, Computational Fluid Dynamics (CFD) simulations require a huge amount of computational power. As such, it is of paramount importance to carefully assess the performance of CFD codes and to study them in depth for enabling optimisation and portability. In this paper, we study three complex CFD codes, OpenFOAM, Alya and CHORUS representing two numerical methods, namely the finite volume and finite-element methods, on both structured and unstructured meshes. To all codes, we apply a generic performance analysis method based on a set of metrics helping the code developer in spotting the critical points that can potentially limit the scalability of a parallel application. We show the root cause of the performance bottlenecks studying the three applications on the MareNostrum4 supercomputer. We conclude providing hints for improving the performance and the scalability of each application.
HPC systems and parallel applications are increasing their complexity. Therefore the possibility of easily study and project at large scale the performance of scientific applications is of paramount importance. In this paper we describe a performance analysis method and we apply it to four complex HPC applications. We perform our study on a pre-production HPC system powered by the latest Arm-based CPUs for HPC, the Marvell ThunderX2. For each application we spot inefficiencies and factors that limit their scalability. The results show that in several cases the bottlenecks do not come from the hardware but from the way applications are programmed or the way the system software is configured.
Clusters of emerging technologies are appearing with more and more frequency in HPC. After years of skepticism, data-centers are adopting them as production systems thanks to several geopolitical and technological factors. The most honorable example is the Fugaku supercomputer, powered by the latest Fujitsu A64FX CPU. Which is the behavior of mature HPC codes on such emerging technology clusters? Which performance will obtain scientists when running their HPC applications "as is" on these clusters? This paper presents the evaluation of CTE-Arm, a Fugaku-like system, including both fine-tuned microbenchmarks and five scientific applications run without prior fine-tuning: Alya, NEMO, Gromacs, OpenIFS, and WRF. Results show that while micro-architectural benchmarks show performance as expected, the performance obtained running HPC applications not tuned for a specific architecture are between 2× and 4× slower compared with a standard Intel-based HPC system. Therefore further effort is needed to improve tools (e.g., compilers) and system software (e.g., MPI libraries) to ease applications deployment and improve their performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.