Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Brannick, James; Chen, Yao; Hu, Xiaozhe; Zikatanov, Ludmil T.

doi:10.1007/978-1-4614-7172-1_5

Cited by 19 publications

(18 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each row of the SPE10 matrix has 13.7 non-zero entries on average while l (the row length of the ELL matrix) is set to 14 automatically by formula (7). From the table, we can see when there is no ELL matrix, the performance is the poorest.…”

Section: Sparse Matrix-vector Multiplicationmentioning

confidence: 99%

Accelerating iterative linear solvers using multiple graphical processing units

Chen

Liu

Yang

2014

International Journal of Computer Mathematics

View full text Add to dashboard Cite

In this paper, we develop, study and implement iterative linear solvers and preconditioners using multiple graphical processing units (GPUs). Techniques for accelerating sparse matrix-vector (SpMV) multiplication, linear solvers and preconditioners are presented. Four Krylov subspace solvers, a Neumann polynomial preconditioner and a domain decomposition preconditioner are implemented. Our numerical tests with NVIDIA C2050 GPUs show that the SpMV kernel can be sped over 40 times faster using four GPUs. Our linear solvers and preconditioners have similar speedup.

show abstract

Section: Sparse Matrix-vector Multiplicationmentioning

confidence: 99%

Accelerating iterative linear solvers using multiple graphical processing units

Chen

Liu

Yang

2014

International Journal of Computer Mathematics

View full text Add to dashboard Cite

show abstract

“…In [14] the authors present a GPU implementation of an unsmoothed aggregation-based AMG, where the focus is both to implement an efficient parallel algorithm for computation of maximal independent set of variables, specifically tuned for standard isotropic graph Laplacian arising in 2nd order elliptic PDEs, and to simplify the Galerkin triple-matrix multiplication. Indeed, when standard unsmoothed aggregation is employed, the prolongation operator is a binary matrix and the Galerkin multiplication is reduced to summations of entries in the matrix at the finer level, which can be efficiently implemented in CUDA.…”

Section: Related Workmentioning

confidence: 99%

AMG based on compatible weighted matching for GPUs

Bernaschi¹,

D'Ambra²,

Pasquini

2020

Parallel Computing

View full text Add to dashboard Cite

We describe main issues and design principles of an efficient implementation, tailored to recent generations of Nvidia Graphics Processing Units (GPUs), of an Algebraic MultiGrid (AMG) preconditioner previously proposed by one of the authors and already available in the open-source package BootCMatch: Bootstrap algebraic multigrid based on Compatible weighted Matching for standard CPU. The AMG method relies on a new approach for coarsening sparse symmetric positive definite (s.p.d.) matrices, named coarsening based on compatible weighted matching. It exploits maximum weight matching in the adjacency graph of the sparse matrix, driven by the principle of compatible relaxation, providing a suitable aggregation of unknowns which goes beyond the limits of the usual heuristics applied in the current methods. We adopt an approximate solution of the maximum weight matching problem, based on a recently proposed parallel algorithm, referred as the Suitor algorithm, and show that it allow us to obtain good quality coarse matrices for our AMG on GPUs. We exploit inherent parallelism of modern GPUs in all the kernels involving sparse matrix computations both for the setup of the preconditioner and for its application in a Krylov solver, outperforming preconditioners available in Nvidia AmgX library. We report results about a large set of linear systems arising from discretization of scalar and vector partial differential equations (PDEs).

show abstract

“…That is, any vertex in V H corresponds to a subgraph in the partitioning, and the edge (i, j) exists in E H if and only if the i-th and j-th subgraphs are connected in the graph G . The algorithm we use in forming a graph partitioning is a variant of the approach we developed and tested for graphics processing units in [5]. The procedure iteratively applies the following two steps:…”

Section: Subspaces By Graph Partitioning and Graph Matchingmentioning

confidence: 99%

Aggregation-Based Aggressive Coarsening with Polynomial Smoothing

Brannick

2014

Lecture Notes in Computational Science and Engineering

Self Cite

View full text Add to dashboard Cite

This paper develops an algebraic multigrid preconditioner for the graph Laplacian. The proposed approach uses aggressive coarsening based on the aggregation framework in the setup phase and a polynomial smoother with sufficiently large degree within a (nonlinear) Algebraic Multilevel Iteration as a preconditioner to the flexible Conjugate Gradient iteration in the solve phase. We show that by combining these techniques it is possible to design a simple and scalable algorithm. Results of the algorithm applied to graph Laplacian systems arising from the standard linear finite element discretization of the scalar Poisson problem are reported.

show abstract

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Cited by 19 publications

References 25 publications

Accelerating iterative linear solvers using multiple graphical processing units

Accelerating iterative linear solvers using multiple graphical processing units

AMG based on compatible weighted matching for GPUs

Aggregation-Based Aggressive Coarsening with Polynomial Smoothing

Contact Info

Product

Resources

About