2013
DOI: 10.1007/978-1-4614-7172-1_5
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs

Abstract: We design and implement a parallel algebraic multigrid method for isotropic graph Laplacian problems on multicore Graphical Processing Units (GPUs). The proposed AMG method is based on the aggregation framework. The setup phase of the algorithm uses a parallel maximal independent set algorithm in forming aggregates and the resulting coarse level hierarchy is then used in a K-cycle iteration solve phase with a 1 -Jacobi smoother. Numerical tests of a parallel implementation of the method for graphics processors… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 25 publications
0
17
0
Order By: Relevance
“…Each row of the SPE10 matrix has 13.7 non-zero entries on average while l (the row length of the ELL matrix) is set to 14 automatically by formula (7). From the table, we can see when there is no ELL matrix, the performance is the poorest.…”
Section: Sparse Matrix-vector Multiplicationmentioning
confidence: 99%
“…Each row of the SPE10 matrix has 13.7 non-zero entries on average while l (the row length of the ELL matrix) is set to 14 automatically by formula (7). From the table, we can see when there is no ELL matrix, the performance is the poorest.…”
Section: Sparse Matrix-vector Multiplicationmentioning
confidence: 99%
“…In [14] the authors present a GPU implementation of an unsmoothed aggregation-based AMG, where the focus is both to implement an efficient parallel algorithm for computation of maximal independent set of variables, specifically tuned for standard isotropic graph Laplacian arising in 2nd order elliptic PDEs, and to simplify the Galerkin triple-matrix multiplication. Indeed, when standard unsmoothed aggregation is employed, the prolongation operator is a binary matrix and the Galerkin multiplication is reduced to summations of entries in the matrix at the finer level, which can be efficiently implemented in CUDA.…”
Section: Related Workmentioning
confidence: 99%
“…That is, any vertex in V H corresponds to a subgraph in the partitioning, and the edge (i, j) exists in E H if and only if the i-th and j-th subgraphs are connected in the graph G . The algorithm we use in forming a graph partitioning is a variant of the approach we developed and tested for graphics processing units in [5]. The procedure iteratively applies the following two steps:…”
Section: Subspaces By Graph Partitioning and Graph Matchingmentioning
confidence: 99%