Hardware-Based Efficiency Advances in the EXA-DUNE Project

Bastian, Peter; Engwer, Christian; Fahlke, Jorrit; Geveler, Markus; Göddeke, Dominik; Iliev, Oleg; Ippisch, Olaf; Milk, René; Möhring, Jan; Müthing, Steffen; Ohlberger, Mario; Ribbrock, Dirk; Turek, Stefan

doi:10.1007/978-3-319-40528-5_1

Cited by 13 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The higher order of the derivatives in the Laplacian imply more work, in particular for the face integrals where both the values and the gradients must be computed from Eq. (7). In Table 2, we count the interpolation of the values and normal derivatives as two invocations to a face normal interpolation to quantify the increased cost, even though they are implemented by a single pass through the data.…”

Section: Tensor Product Algorithmsmentioning

confidence: 99%

“…Design Choice 4 In order to simplify implementation and re-use 2D kernels, the local coordinate system on faces is always set such that reference cell gradients touch the d − 1 tangential directions first and the face-normal direction comes last by adjusting the order of components in the geometry tensors rather than changing indices of evaluators, see Eq. (7). Data Structure 1 lists a slim way of storing a pair of faces in case of vectorization.…”

Section: Vectorization Layout For Face Integralsmentioning

confidence: 99%

“…Tensor product evaluation has been a very active research area with implementations available in the generic finite element software packages DUNE [8,7], Firedrake [45,41,38], Loopy [25], mfem [27], Nek5000 [14], Nektar++ [10], or NGSolve [47] as well as application codes such as the compressible flow solver framework Flexi [20], SPECFEM3D [28] or pTatin3D [40]. Despite the wide availability of software, including code generators and domain-specific languages in Firedrake and Loopy, we believe that the analysis of high performance computing aspects and the expected performance envelopes of operator evaluation-independent of the user interfaces-are still missing.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

Kronbichler

Kormann

2019

ACM Trans. Math. Softw.

146

View full text Add to dashboard Cite

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators based on sum factorization on quadrilateral and hexahedral meshes. We identify a set of kernels for fast quadrature on cells and faces targeting a wide class of weak forms originating from linear and nonlinear partial differential equations. Different algorithms and data structures for the implementation of operator evaluation are compared in an in-depth performance analysis. The sum factorization kernels are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional compute kernels. In isolation our implementation then reaches up to 60% of arithmetic peak on Intel Haswell and Broadwell processors and up to 50% of arithmetic peak on Intel Knights Landing. The full operator evaluation reaches only about half that throughput due to memory bandwidth limitations from loading the input and output vectors, MPI ghost exchange, as well as handling variable coefficients and the geometry. Our performance analysis shows that the results are often within 10% of the available memory bandwidth for the proposed implementation, with the exception of the Cartesian mesh case where the cost of gather operations and MPI communication are more substantial. Höchstleistungsrechnen (KONWIHR) in the framework of the project Matrix-free GPU kernels for complex applications in fluid dynamics. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz.de) through project id pr83te.

show abstract

Section: Tensor Product Algorithmsmentioning

confidence: 99%

Section: Vectorization Layout For Face Integralsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

Kronbichler

Kormann

2019

ACM Trans. Math. Softw.

146

View full text Add to dashboard Cite

show abstract

“…The concept of matrix-free evaluation with sum factorization has been widely adopted by now, like in the deal.II [1], DUNE [5,40,60], Firedrake [63], mfem [2], Nek5000 [28] or Nektar++ [13] projects. These fast evaluation techniques are directly applicable to explicit time stepping schemes, as we have demonstrated for wave propagation in [42,53,[65][66][67][68] and the compressible Navier-Stokes equations [24].…”

Section: Implementation Of Sum Factorization In the Dealii Librarymentioning

confidence: 99%

ExaDG: High-Order Discontinuous Galerkin for the Exa-Scale

Arndt

Fehn

Kanschat

et al. 2020

Lecture Notes in Computational Science and Engineering

View full text Add to dashboard Cite

This text presents contributions to efficient high-order finite element solvers in the context of the project ExaDG, part of the DFG priority program 1648 Software for Exascale Computing (SPPEXA). The main algorithmic components are the matrix-free evaluation of finite element and discontinuous Galerkin operators with sum factorization to reach a high node-level performance and parallel scalability, a massively parallel multigrid framework, and efficient multigrid smoothers. The algorithms have been applied in a computational fluid dynamics context. The software contributions of the project have led to a speedup by a factor 3 − 4 depending on the hardware. Our implementations are available via the deal.II finite element library.

show abstract

“…Certainly the combination of using explicit time-stepping for high-order methods alongside performance portable and architecture independent methods running on both CPUs and GPUs has been demonstrate in solvers such as PyFR [13]. Additionally, work by other finite element groups such as deal.II [14] and Dune [15] on mostly tensor product quadrilateral and hexahedral elements has been conducted, but typically from the CPU perspective. In this work, we aim to demonstrate how architecture-independent programming can be applied from the context of implicit time-stepping.…”

Section: Introductionmentioning

confidence: 99%