2013
DOI: 10.1016/j.jpdc.2012.10.003
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(23 citation statements)
references
References 18 publications
0
23
0
Order By: Relevance
“…An MPI parallel implementation of the cyclic reduction for block-tridiagonals was considered in [6], [12]. A block cyclic reduction solver on GPU was used in [1] in the context of a CFD applications, with block sizes up to 32.…”
Section: Kernel: Linear System Solvesmentioning
confidence: 99%
“…An MPI parallel implementation of the cyclic reduction for block-tridiagonals was considered in [6], [12]. A block cyclic reduction solver on GPU was used in [1] in the context of a CFD applications, with block sizes up to 32.…”
Section: Kernel: Linear System Solvesmentioning
confidence: 99%
“…• It was recently shown that the RDA performs better than the CRA depending on the combination of block size (M ), the number of block rows (N ) and the processor count (P ) specific to the problem at hand in addition to certain machine specific constants [6]. • By design, the RDA is more load balanced (and scalable at large P ) than the CRA as the number of active processors changes by half in every step of the CRA.…”
Section: A Motivationmentioning
confidence: 99%
“…Scalability: One of the first such studies was reported very recently in [6]. The odd-even cyclic reduction and the prefixcomputation-based recursive doubling algorithms were reanalyzed carefully to study the effect of block sizes on their parallel scalability.…”
Section: B Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Parallel cyclic reduction is efficient when the M is small and the spatial operator L is dense. Its estimated computation time is 16 (C in + 6 C mm )M 3 n N + log N + 2βM 2 log n N + 2 log n , where C in , C mm are the amortized time per floating point operation for matrix inversion and matrixmatrix multiplication, respectively. β is the average time to transmit one floating point number between any two processing elements across the network.…”
Section: Acknowledgmentsmentioning
confidence: 99%