waLBerla is a massively parallel software framework for simulating complex flows with the lattice Boltzmann method (LBM). Performance and scalability results are presented for SuperMUC, the world's fastest x86-based supercomputer ranked number 6 on the Top500 list, and JUQUEEN, a Blue Gene/Q system ranked as number 5.We reach resolutions with more than one trillion cells and perform up to 1.93 trillion cell updates per second using 1.8 million threads. The design and implementation of waLBerla is driven by a careful analysis of the performance on current petascale supercomputers. Our fully distributed data structures and algorithms allow for efficient, massively parallel simulations on these machines. Elaborate node level optimizations and vectorization using SIMD instructions result in highly optimized compute kernels for the single-and two-relaxation-time LBM. Excellent weak and strong scaling is achieved for a complex vascular geometry of the human coronary tree.
Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for developing simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations.
The motion of ionic solutes and charged particles under the influence of an electric field and the ensuing hydrodynamic flow of the underlying solvent is ubiquitous in aqueous colloidal suspensions. The physics of such systems is described by a coupled set of differential equations, along with boundary conditions, collectively referred to as the electrokinetic equations. Capuani et al. [J. Chem. Phys. 121, 973 (2004)] introduced a lattice-based method for solving this system of equations, which builds upon the lattice Boltzmann algorithm for the simulation of hydrodynamic flow and exploits computational locality. However, thus far, a description of how to incorporate moving boundary conditions into the Capuani scheme has been lacking. Moving boundary conditions are needed to simulate multiple arbitrarily moving colloids. In this paper, we detail how to introduce such a particle coupling scheme, based on an analogue to the moving boundary method for the pure lattice Boltzmann solver. The key ingredients in our method are mass and charge conservation for the solute species and a partial-volume smoothing of the solute fluxes to minimize discretization artifacts. We demonstrate our algorithm's effectiveness by simulating the electrophoresis of charged spheres in an external field; for a single sphere we compare to the equivalent electro-osmotic (co-moving) problem. Our method's efficiency and ease of implementation should prove beneficial to future simulations of the dynamics in a wide range of complex nanoscopic and colloidal systems that were previously inaccessible to lattice-based continuum algorithms.
The precise modeling of vascular structures plays a key role in medical imaging applications, such as diagnosis, therapy planning and blood flow simulations. For the simulation of blood flow in particular, high-precision models are required to produce accurate results. It is thus common practice to perform extensive manual data polishing on vascular segmentations prior to simulation. This usually involves a complex tool chain which is highly impractical for clinical on-site application. To close this gap in current blood flow simulation pipelines, we present a novel technique for interactive vascular modeling which is based on implicit sweep surfaces. Our method is able to generate and correct smooth high-quality models based on geometric centerline descriptions on the fly. It supports complex vascular free-form contours and consequently allows for an accurate and fast modeling of pathological structures such as aneurysms or stenoses. We extend the concept of implicit sweep surfaces to achieve increased robustness and applicability as required in the medical field. We finally compare our method to existing techniques and provide case studies that confirm its contribution to current simulation pipelines.
Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute intensive due to a temperature dependent diffusive concentration. We significantly extend previous simulations that have used simpler phase-field models or were performed on smaller domain sizes. The new method has been implemented within the massively parallel HPC framework waLBerla that is designed to exploit current supercomputers efficiently. We apply various optimization techniques, including buffering techniques, explicit SIMD kernel vectorization, and communication hiding. Simulations utilizing up to 262,144 cores have been run on three different supercomputing architectures and weak scalability results are shown. Additionally, a hierarchical, mesh-based data reduction strategy is developed to keep the I/O problem manageable at scale.
Realistic simulations in engineering or in the materials sciences can consume enormous computing resources and thus require the use of massively parallel supercomputers. The probability of a failure increases both with the runtime and with the number of system components. For future exascale systems it is therefore considered critical that strategies are developed to make software resilient against failures. In this article, we present a scalable, distributed, diskless, and resilient checkpointing scheme that can create and recover snapshots of a partitioned simulation domain. We demonstrate the efficiency and scalability of the checkpoint strategy for simulations with up to 40 billion computational cells executing on more than 400 billion floating point values. A checkpoint creation is shown to require only a few seconds and the new checkpointing scheme scales almost perfectly up to more than 260 000 (2 18 ) processes. To recover from a diskless checkpoint during runtime, we realize the recovery algorithms using ULFM MPI. The checkpointing mechanism is fully integrated in a state-of-the-art high-performance multi-physics simulation framework. We demonstrate the efficiency and robustness of the method with a realistic phase-field simulation originating in the material sciences and with a lattice Boltzmann method implementation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.