Nils Kohl scite author profile

Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for developing simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations.

show abstract

A scalable and extensible checkpointing scheme for massively parallel simulations

Kohl

Hötzer

Schornbaum

et al. 2018

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

Realistic simulations in engineering or in the materials sciences can consume enormous computing resources and thus require the use of massively parallel supercomputers. The probability of a failure increases both with the runtime and with the number of system components. For future exascale systems it is therefore considered critical that strategies are developed to make software resilient against failures. In this article, we present a scalable, distributed, diskless, and resilient checkpointing scheme that can create and recover snapshots of a partitioned simulation domain. We demonstrate the efficiency and scalability of the checkpoint strategy for simulations with up to 40 billion computational cells executing on more than 400 billion floating point values. A checkpoint creation is shown to require only a few seconds and the new checkpointing scheme scales almost perfectly up to more than 260 000 (2 18 ) processes. To recover from a diskless checkpoint during runtime, we realize the recovery algorithms using ULFM MPI. The checkpointing mechanism is fully integrated in a state-of-the-art high-performance multi-physics simulation framework. We demonstrate the efficiency and robustness of the method with a realistic phase-field simulation originating in the material sciences and with a lattice Boltzmann method implementation.

show abstract

The HyTeG finite-element software framework for scalable multigrid solvers

Kohl

Thönnes

Drzisga

et al. 2018

International Journal of Parallel, Emergent and Distributed Sys

View full text Add to dashboard Cite

TerraNeo—Mantle Convection Beyond a Trillion Degrees of Freedom

Bauer

Bunge

Drzisga

et al. 2020

View full text Add to dashboard Cite

Textbook efficiency: massively parallel matrix-free multigrid for the Stokes system

Kohl¹,

Rüde²

2020

Preprint

View full text Add to dashboard Cite

We employ textbook multigrid efficiency (TME), as introduced by Achi Brandt, to construct an asymptotically optimal monolithic multigrid solver for the Stokes system. The geometric multigrid solver builds upon the concept of hierarchical hybrid grids (HHG), which is extended to higher-order finiteelement discretizations, and a corresponding matrix-free implementation. The computational cost of the full multigrid (FMG) iteration is quantified, and the solver is applied to multiple benchmark problems. Through a parameter study, we suggest configurations that achieve TME for both, stabilized equal-order, and Taylor-Hood discretizations. The excellent node-level performance of the relevant compute kernels is presented via a roofline analysis. Finally, we demonstrate the weak and strong scalability to up to 147, 456 parallel processes and solve Stokes systems with more than 3.6 × 10 12 (trillion) unknowns.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nils Kohl

waLBerla: A block-structured high-performance framework for multiphysics simulations

A scalable and extensible checkpointing scheme for massively parallel simulations

The HyTeG finite-element software framework for scalable multigrid solvers

TerraNeo—Mantle Convection Beyond a Trillion Degrees of Freedom

Textbook efficiency: massively parallel matrix-free multigrid for the Stokes system

Contact Info

Product

Resources

About