Hierarchical hybrid grids: data structures and core algorithms for multigrid

Bergen, B.; Hülsemann, Frank

doi:10.1002/nla.382

Cited by 54 publications

(85 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the exchange and the optimization of the kernels for the various simulation scenarios and hardware is one features of our sweep concept, good performance results are obtained with the WaLBerla framework. In this respect, the framework benefits from our long-term experience in the optimization of numerical codes for serial [40][41][42][43] as well as parallel large scale simulations [44]. Performance results for WaLBerla can be found in [45].…”

Section: Efficiency and Scalabilitymentioning

confidence: 99%

WaLBerla: HPC software design for computational engineering simulations

Feichtinger

Donath

Köstler

et al. 2011

Journal of Computational Science

View full text Add to dashboard Cite

Section: Efficiency and Scalabilitymentioning

confidence: 99%

WaLBerla: HPC software design for computational engineering simulations

Feichtinger

Donath

Köstler

et al. 2011

Journal of Computational Science

View full text Add to dashboard Cite

“…Serendipitously, this special processing order does not have a significant effect on the overall convergence of the solver; cf. [6]. For the new class of machines, additionally OpenMP parallelization was implemented to improve the node-level parallelism.…”

Section: Pressure-correction Schemementioning

confidence: 99%

Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems

Gmeiner¹,

Rüde²,

Stengel³

et al. 2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

In many applications involving incompressible fluid flow, the Stokes system plays an important role. Complex flow problems may require extremely fine resolutions, easily resulting in saddle-point problems with more than a trillion (10 12 ) unknowns. Even on the most advanced supercomputers, the fast solution of such systems of equations is a highly nontrivial and challenging task. In this work we consider a realization of an iterative saddle-point solver which is based mathematically on the Schur-complement formulation of the pressure and algorithmically on the abstract concept of hierarchical hybrid grids. The design of our fast multigrid solver is guided by an innovative performance analysis for the computational kernels in combination with a quantification of the communication overhead. Excellent node performance and good scalability to almost a million parallel threads are demonstrated on different characteristic types of modern supercomputers.1. Introduction. Current leading edge supercomputers can provide performance in the order of several petaflop/s, enabling the development of increasingly complex and accurate computational models having unprecedented size. This is especially relevant in flow simulations that may exhibit many small scale features that must be resolved over large domains. As an example, the problem of earth mantle convection is posed on a thick spherical shell of approximately 3 000 km depth and 6 300 km radius, resulting in an overall volume of close to a trillion, that is, 10 12 km 3 . A high resolution then results automatically in huge algebraic systems.Although finite element (FE) methods are flexible enough to handle different local mesh-sizes, fully adaptive meshing techniques require dynamic data structures and a complex program control flow that incurs significant computational cost. Recent work on parallel adaptive FE techniques can be found, e.g., in [1,2,11,44]. In [10] it is shown that an adaptive parallel FE method can reach locally 1 km resolution for the mantle convection problem on a large scale supercomputer. Here we will demonstrate that such a resolution can even be reached globally.Higher order FE approaches can lead to a better accuracy with the same number of unknowns, but the linear systems are denser. This implies more computational work, more memory access cost, and also higher parallel communication cost, so

show abstract

“…The SR8000 is designed to be efficient even on codes that do not exhibit good spatial locality, but only when it can schedule the memory access using preload instructions. This makes the SR8000 extremely efficient on structured codes, where it is capable of achieving more than 50% of its Rpeak [2]. However, this architecture is also very sensitive to indirection, and, if it is not possible for the SR8000 to make efficient use of PVP, performance suffers.…”

Section: 11mentioning

confidence: 96%

Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

Bergen

Wellein²,

Hülsemann³

et al. 2007

International Journal of Parallel, Emergent and Distributed Sys

View full text Add to dashboard Cite

The design of the hierarchical hybrid grids (HHG) framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on super computers. In order to realize this goal, careful analysis of the low-level, computationally intensive algorithms used in implementing the library is necessary. This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multigrid component algorithms such as smoothing and residual error calculation. To aid in this investigation, two metrics have been developed: the balance metric (BM), and the loads per miss metric (LPMM). Each of these metrics makes assumptions about the interaction of various data structures and algorithms with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results to determine the actual characteristics of an algorithm/data structure on a given platform. This information can then be used to increase performance.In this paper, we first present an overview of the HHG framework. Next, we introduce the details of the two performance metrics. These metrics are then applied to three different data structures used to implement a Gauß-Seidel smoothing algorithm. Performance results and an interpretation of the underlying interactions of the data structures with several relevant supercomputing architectures are given. Finally, we present a brief discussion of some performance results of the HHG framework, followed by some concluding remarks.

show abstract

Hierarchical hybrid grids: data structures and core algorithms for multigrid

Cited by 54 publications

References 5 publications

WaLBerla: HPC software design for computational engineering simulations

WaLBerla: HPC software design for computational engineering simulations

Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems

Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations

Contact Info

Product

Resources

About