As compute power increases with time, more involved and larger simulations become possible. However, it gets increasingly difficult to efficiently use the provided computational resources. Especially in particlebased simulations with a spatial domain partitioning large load imbalances can occur due to the simulation being dynamic. Then a static domain partitioning may not be suitable. This can deteriorate the overall runtime of the simulation significantly. Sophisticated load balancing strategies must be designed to alleviate this problem. In this paper we conduct a systematic evaluation of the performance of six different load balancing algorithms. Our tests cover a wide range of simulation sizes, and employ one of the largest supercomputers available. In particular we study the runtime and memory complexity of all components of the simulation carefully. When progressing to extreme scale simulations it is essential to identify bottlenecks and to predict the scaling behaviour. Scaling experiments are shown for up to over one million processes.The performance of each algorithm is analyzed with respect to the quality of the load balancing and its runtime costs. Additionally an applied test case is used to judge the applicability of the best algorithms in real world applications. For all tests, the waLBerla multiphysics framework is employed. processes [6,7]. One important aspect of this initial domain partitioning is to achieve an equal workload for all cores. However, since the simulated system is dynamic, and the particles may migrate between subdomains, the workload might be shifted during the simulation. This leads to load imbalances, that can slow down the whole simulation. To overcome this problem, the domain partitioning must be adapted dynamically throughout the simulation and/or the subdomains must be reassigned to different processes.Many simulation frameworks have therefore adopted load balancing and results are published for simulations of various sizes.
Related WorkCompared to rigid body dynamics, molecular dynamics simulations differ in some aspects, however, the load balancing problem is closely related. Therefore we also consider methods proposed in the context of molecular dynamics here. A slightly dated but still relevant review of different methods suitable for load balancing can be found in [8]. Owen et al. [9] use load balancing based on the ParMetis [10] graph partitioning library to balance their combined FEM-DEM simulation. They use two applied test cases namely a 2D bucket filling and a 3D hopper filling example. Measurements with up to 6 cores are presented. Deng et al. [11]present a runtime load balancing approach for molecular dynamics simulations which deforms the domain partitioning at runtime. The initial rectangular grid is optimized by moving the corners of all subdomains individually in space to adjust to the simulation. Good quality of the partitioning is reported for an artificial checkerboard scenario with no acting forces. The load balancing improves the runtime performance but...