Abstract:ABSTRACT. Tridiagonal linear systems of equations can be solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computation on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and… Show more
“…Both parallelization approaches are based on classical cyclic reduction [15]. Two other parallel algorithms for the solution of tridiagonal equation systems are parallel cyclic reduction [16] and recursive doubling [17]. Recently, Zhang et al [18] discussed the applicability of these algorithms on modern GPUs.…”
Abstract-We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular single precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.
“…Both parallelization approaches are based on classical cyclic reduction [15]. Two other parallel algorithms for the solution of tridiagonal equation systems are parallel cyclic reduction [16] and recursive doubling [17]. Recently, Zhang et al [18] discussed the applicability of these algorithms on modern GPUs.…”
Abstract-We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular single precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.
“…The cyclic reduction, the parallel cyclic reduction [25], the recursive doubling [26], and hybrid algorithms were compared with each other in [5]. All considered implementations utilize the local memory and hold the data in-place.…”
Section: Previous Work On Tridiagonal System Solvers On a Gpumentioning
Abstract. Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3-fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU. The original publication is available at link.springer.com.
“…The earliest parallel solution methods were designed for solution of fine-grained problems, that is, problems with n ≈ p, where n is the size of the problem and p the number of processors (of a supercomputer), and, also, the methods were based on high-speed solution using tridiagonal solvers. The most known of these methods include the recursive-doubling reduction method of Stone [31] and its improved version [32], the odd-even or cyclic reduction technique of Hockney [33,34], and recently, the prefix scheme by Sun [35,36], which is a variation of the cyclic reduction method. Each of the cited parallel solution method is capable of solving n-dimensional tridiagonal system in Olog(n) time using n processors.…”
Abstract. The study resulting in this paper applied a parallel algorithm based on a fourth-order compact scheme and suitable for parallel implementation of scientific/engineering systems. The particular system used for demonstration in the study was a time-dependendent system solved in parallel on a 2-head-node, 224-compute-node Apple Xserve G5 multiprocessor. The use of the approximation scheme, which necessitated discretizing in both space and time with hx space width and ht time step, produced a linear tridiagonal, almost-Toeplitz system. The solution used p processors with p ranging from 3 to 63. The speedups, sp, approached the limiting value of p only when p was small but yieldd poor computations errors which became progressively better as p increases. The parallel solution is very accurate having good speedups and accuracies but only when p is within reasonable range of values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.