An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations

Stone, Harold S.

doi:10.1145/321738.321741

Cited by 274 publications

(105 citation statements)

References 4 publications

Supporting

Mentioning

100

Contrasting

Unclassified

Order By: Relevance

“…Both parallelization approaches are based on classical cyclic reduction [15]. Two other parallel algorithms for the solution of tridiagonal equation systems are parallel cyclic reduction [16] and recursive doubling [17]. Recently, Zhang et al [18] discussed the applicability of these algorithms on modern GPUs.…”

Section: Introductionmentioning

confidence: 99%

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Göddeke

Strzodka

2011

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular single precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.

show abstract

Section: Introductionmentioning

confidence: 99%

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Göddeke

Strzodka

2011

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…The cyclic reduction, the parallel cyclic reduction [25], the recursive doubling [26], and hybrid algorithms were compared with each other in [5]. All considered implementations utilize the local memory and hold the data in-place.…”

Section: Previous Work On Tridiagonal System Solvers On a Gpumentioning

confidence: 99%

Fast Poisson Solvers for Graphics Processing Units

Myllykoski

Rossi

Toivanen

2013

Applied Parallel and Scientific Computing

View full text Add to dashboard Cite

Abstract. Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3-fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU. The original publication is available at link.springer.com.

show abstract

“…The earliest parallel solution methods were designed for solution of fine-grained problems, that is, problems with n ≈ p, where n is the size of the problem and p the number of processors (of a supercomputer), and, also, the methods were based on high-speed solution using tridiagonal solvers. The most known of these methods include the recursive-doubling reduction method of Stone [31] and its improved version [32], the odd-even or cyclic reduction technique of Hockney [33,34], and recently, the prefix scheme by Sun [35,36], which is a variation of the cyclic reduction method. Each of the cited parallel solution method is capable of solving n-dimensional tridiagonal system in Olog(n) time using n processors.…”

Section: Introductionmentioning

confidence: 99%

On a High-Order Compact Scheme and Its Utilization in Parallel Solution of a Time-Dependent System on a Distributed Memory Processor

Akpan

2007

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The study resulting in this paper applied a parallel algorithm based on a fourth-order compact scheme and suitable for parallel implementation of scientific/engineering systems. The particular system used for demonstration in the study was a time-dependendent system solved in parallel on a 2-head-node, 224-compute-node Apple Xserve G5 multiprocessor. The use of the approximation scheme, which necessitated discretizing in both space and time with hx space width and ht time step, produced a linear tridiagonal, almost-Toeplitz system. The solution used p processors with p ranging from 3 to 63. The speedups, sp, approached the limiting value of p only when p was small but yieldd poor computations errors which became progressively better as p increases. The parallel solution is very accurate having good speedups and accuracies but only when p is within reasonable range of values.

show abstract

An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations

Cited by 274 publications

References 4 publications

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

Fast Poisson Solvers for Graphics Processing Units

On a High-Order Compact Scheme and Its Utilization in Parallel Solution of a Time-Dependent System on a Distributed Memory Processor

Contact Info

Product

Resources

About