Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations

Seal, Sudip K.; Perumalla, Kalyan S.; Hirshman, S. P.

doi:10.1016/j.jpdc.2012.10.003

Cited by 17 publications

(23 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An MPI parallel implementation of the cyclic reduction for block-tridiagonals was considered in [6], [12]. A block cyclic reduction solver on GPU was used in [1] in the context of a CFD applications, with block sizes up to 32.…”

Section: Kernel: Linear System Solvesmentioning

confidence: 99%

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

Daviña

Román

2016

Parallel Processing and Applied Mathematics

View full text Add to dashboard Cite

Abstract. In an eigenvalue problem defined by one or two matrices with block-tridiagonal structure, if only a few eigenpairs are required it is interesting to consider iterative methods based on Krylov subspaces, even if matrix blocks are dense. In this context, using the GPU for the associated dense linear algebra may provide high performance. We analyze this in an implementation done in the context of SLEPc, the Scalable Library for Eigenvalue Problem Computations. In the case of a generalized eigenproblem or when interior eigenvalues are computed with shift-andinvert, the main computational kernel is the solution of linear systems with a block-tridiagonal matrix. We explore possible implementations of this operation on the GPU, including a block cyclic reduction algorithm.

show abstract

Section: Kernel: Linear System Solvesmentioning

confidence: 99%

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

Daviña

Román

2016

Parallel Processing and Applied Mathematics

View full text Add to dashboard Cite

show abstract

“…• It was recently shown that the RDA performs better than the CRA depending on the combination of block size (M ), the number of block rows (N ) and the processor count (P ) specific to the problem at hand in addition to certain machine specific constants [6]. • By design, the RDA is more load balanced (and scalable at large P ) than the CRA as the number of active processors changes by half in every step of the CRA.…”

Section: A Motivationmentioning

confidence: 99%

“…Scalability: One of the first such studies was reported very recently in [6]. The odd-even cyclic reduction and the prefixcomputation-based recursive doubling algorithms were reanalyzed carefully to study the effect of block sizes on their parallel scalability.…”

Section: B Related Workmentioning

confidence: 99%

“…The second and third sub-parts describe the formulations of the independent and dependent phases, respectively. This new algorithm is ideal for computing solutions of block tridiagonal systems with multiple right hand sides, especially for classes of problems for which it is computationally more efficient to use a recursive doubling algorithm than a cyclic reduction algorithm [6]. …”

Section: Accelerating the Recursive Doubling Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

An Accelerated Recursive Doubling Algorithm for Block Tridiagonal Systems

Seal

2014

2014 IEEE 28th International Parallel and Distributed Processing Symposium

Self Cite

View full text Add to dashboard Cite

Block tridiagonal systems of linear equations arise in a wide variety of scientific and engineering applications. Recursive doubling algorithm is a well-known prefix computationbased numerical algorithm that requires O(M 3 ( N P + log P )) work to compute the solution of a block tridiagonal system with N block rows and block size M on P processors. In real-world applications, solutions of tridiagonal systems are most often sought with multiple, often hundreds and thousands, of different right hand sides but with the same tridiagonal matrix. Here, we show that a recursive doubling algorithm is sub-optimal when computing solutions of block tridiagonal systems with multiple right hand sides and present a novel algorithm, called the accelerated recursive doubling algorithm, that delivers O(R) improvement when solving block tridiagonal systems with R distinct right hand sides. Since R is typically ∼ 10 2 − 10 4 , this improvement translates to very significant speedups in practice. Detailed complexity analyses of the new algorithm with empirical confirmation of runtime improvements are presented. To the best of our knowledge, this algorithm has not been reported before in the literature.

show abstract

“…Parallel cyclic reduction is efficient when the M is small and the spatial operator L is dense. Its estimated computation time is 16 (C in + 6 C mm )M 3 n N + log N + 2βM 2 log n N + 2 log n , where C in , C mm are the amortized time per floating point operation for matrix inversion and matrixmatrix multiplication, respectively. β is the average time to transmit one floating point number between any two processing elements across the network.…”

Section: Acknowledgmentsmentioning

confidence: 99%

Towards scalable parallel-in-time turbulent flow simulations

et al. 2013

View full text Add to dashboard Cite

We present a reformulation of unsteady turbulent flow simulations. The initial condition is relaxed and information is allowed to propagate both forward and backward in time. Simulations of chaotic dynamical systems with this reformulation can be proven to be well-conditioned time domain boundary value problems. The reformulation can enable scalable parallel-in-time simulation of turbulent flows. I. NEED FOR SPACE-TIME PARALLELISMThe use of computational fluid dynamics (CFD) in science and engineering can be categorized into Analysis and Design. A CFD Analysis performs a simulation on a set of manually picked parameter values. The flow field is then inspected to gain understanding of the flow physics. Scientific and engineering decisions are then made based on understanding of the flow field. Analysis based on high fidelity turbulent flow simulations, particular Large Eddy Simulations, is a rapidly growing practice in complex engineering applications 12 .CFD based Design goes beyond just performing individual simulations, towards sensitivity analysis, optimization, control, uncertainty quantification and data based inference. Design is enabled by Analysis capabilities, but often requires more rapid turnaround. For example, an engineer designer or an optimization software needs to perform a series of simulations, modifying the geometry based on previous simulation results. Each simulation must complete within at most a few hours in an industrial design environment. Most current practices of design use steady state CFD solvers, employing RANS (Reynolds Averaged NavierStokes) models for turbulent flows. Design using high fidelity, unsteady turbulent flow simulations has been investigated in academia 3 . Despite their great potential, high fidelity design is infeasible in an industrial setting because each simulation typically takes days to weeks. FIG. 1:Exponential increase of high performance computing power, primarily sustained by increased parallelism in the past decade. Data originate from top500.org. GFLOPS, TFLOPS, PFLOPS and EFLOPS represent 10 9 , 10 12 , 10 15 and 10 18 FLoating point Operations Per Second, respectively.The inability of performing high fidelity turbulent flow simulations in short turnaround time is a barrier to the game-changing technology of high fidelity CFD-based design. Nevertheless, development in High Performance Computing (HPC), as shown in Figure 1, promises to delivery in about ten years computing hardware a thousand times faster than those available today. This will be achieved through extreme scale a) Corresponding

show abstract

Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations

Cited by 17 publications

References 18 publications

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

GPU Implementation of Krylov Solvers for Block-Tridiagonal Eigenvalue Problems

An Accelerated Recursive Doubling Algorithm for Block Tridiagonal Systems

Towards scalable parallel-in-time turbulent flow simulations

Contact Info

Product

Resources

About