2006
DOI: 10.1002/nme.1557
|View full text |Cite
|
Sign up to set email alerts
|

Performance comparison of data-reordering algorithms for sparse matrix–vector multiplication in edge-based unstructured grid computations

Abstract: SUMMARYSeveral performance improvements for finite-element edge-based sparse matrix-vector multiplication algorithms on unstructured grids are presented and tested. Edge data structures for tetrahedral meshes and triangular interface elements are treated, focusing on nodal and edges renumbering strategies for improving processor and memory hierarchy use. Benchmark computations on Intel Itanium 2 and Pentium IV processors are performed. The results show performance improvements in CPU time ranging from 2 to 3.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0
1

Year Published

2006
2006
2013
2013

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 29 publications
(29 citation statements)
references
References 20 publications
0
28
0
1
Order By: Relevance
“…Most of the computational effort spent in this solution procedure is due to the matrix-vector products within the GMRES driver for both flow and marker. To improve the computational efficiency with respect to standard element-by-element and sparse matrix-vector storage schemes, we adopt an edge-based data structure in order to minimize indirect memory addressing, diminish floating point operation counts (flops) and memory requirements, as described in Elias et al [20] and Coutinho et al [19] for both the Navier-Stokes equations and the marking function advection. Further computational gains are obtained from data preprocessing performed by the EdgePack library-a package to improve cache reutilization based on reordering and grouping techniques [40].…”
Section: Solution Proceduresmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the computational effort spent in this solution procedure is due to the matrix-vector products within the GMRES driver for both flow and marker. To improve the computational efficiency with respect to standard element-by-element and sparse matrix-vector storage schemes, we adopt an edge-based data structure in order to minimize indirect memory addressing, diminish floating point operation counts (flops) and memory requirements, as described in Elias et al [20] and Coutinho et al [19] for both the Navier-Stokes equations and the marking function advection. Further computational gains are obtained from data preprocessing performed by the EdgePack library-a package to improve cache reutilization based on reordering and grouping techniques [40].…”
Section: Solution Proceduresmentioning
confidence: 99%
“…The main characteristics of our incompressible flow solver [19][20][21] are: SUPG and pressure-stabilizing/Petrov-Galerkin (PSPG) [3,22] stabilized finite element formulation; implicit time marching scheme with adaptive time stepping control; advanced inexact Newton solvers; edge-based data structures to save memory and improve performance; support to message passing and shared memory parallel programming models; and large eddy simulation (LES) extensions using a classical Smagorinsky model. We introduce VOF extensions in this flow solver to track the evolving free surface [11,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…The computations are performed in parallel using a distributed memory paradigm through the message passing interface library. The parallel partitions are generated by the Metis library [44], whereas the information regarding the edges of the computational grid is obtained from the EdgePack library as described in [45]. EdgePack also reorders nodes, edges and elements to improve data locality, exploiting efficiently the memory hierarchy of current processors.…”
Section: Solution Proceduresmentioning
confidence: 99%
“…The cache sharing scheme in older Intel Xeon processors, although behaving as Quad-core are, in fact, two Dual-core processors put together. Mesh entities are ordered to improve data locality as described in [1].…”
Section: Performance Testsmentioning
confidence: 99%
“…It is important to remember that the main EdgeCFD's kernels (matrix-vector product, stiffness matrix build up and assembly of elements residua) strongly relies in indirect memory addressing operations and are, thus, influenced by how mesh entities are accessed and used during these operations. In EdgeCFD, mesh entities are reordered to makes efficient use of cache memory as explained in details in [1]. However, due to the complexity of the main loops of the software, cache misses are expected even for reordered meshes.…”
Section: Fig 2 Speedup For Two Xeon Systems Running Up To 8 Intra-nmentioning
confidence: 99%