A distributed-memory hierarchical solver for general sparse linear systems

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran; Boman, Erik G.; Darve, Eric

doi:10.1016/j.parco.2017.12.004

Cited by 23 publications

(27 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This parallelization could resolve the issue of huge memory consumption, which we experienced on shared-memory systems. For this regard, the parallelization proposed for a sparse direct solver (LoRaSp [23]) by Chen et al [22] would be helpful. However, the modification to dense systems is not trivial because the algorithm of IFMM is significantly different from that of LoRaSp.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Parallelization of the inverse fast multipole method with an application to boundary element method

Takahashi

Chen

Darve

2020

Computer Physics Communications

Self Cite

View full text Add to dashboard Cite

We present an algorithm to parallelize the inverse fast multipole method (IFMM), which is an approximate direct solver for dense linear systems. The parallel scheme is based on a greedy coloring algorithm, where two nodes in the hierarchy with the same color are separated by at least σ nodes. We proved that when σ ≥ 6, the workload associated with one color is embarrassingly parallel. However, the number of nodes in a group (color) may be small when σ = 6. Therefore, we also explored σ = 3, where a small fraction of the algorithm needs to be serialized, and the overall parallel efficiency was improved. We implemented the parallel IFMM using OpenMP for shared-memory machines. Successively, we applied it to a fast-multipole accelerated boundary element method (FMBEM) as a preconditioner, and compared its efficiency with (a) the original IFMM parallelized by linking a multi-threaded linear algebra library and (b) the commonly used parallel block-diagonal preconditioner. Our results showed that our parallel IFMM achieved at most 4× and 11× speedups over the reference method (a) and (b), respectively, in realistic examples involving more than one million variables.

show abstract

Section: Resultsmentioning

confidence: 99%

“…However, this overfine-grained parallelization would inevitably introduce overwhelming overhead when the number of threads is large. Another closely related approach is the distributed-memory parallel solver introduced by Chen et al [22], which parallelizes the LoRaSp algorithm [23], an analog of the IFMM for solving 1 when A is sparse.…”

Section: Introductionmentioning

confidence: 99%

Parallelization of the inverse fast multipole method with an application to boundary element method

Takahashi

Chen

Darve

2020

Computer Physics Communications

Self Cite

View full text Add to dashboard Cite

show abstract

“…Sparsification would then have to be ordered by color, i.e., s i is sparsified before s j if and only if c i < c j . This is similar to what is done in the parallel version of LoRaSp, see [11]. As a result, we used Algorithm 4 to maximize concurrency.…”

Section: Simultaneous Sparsificationmentioning

confidence: 96%

A task-based distributed parallel sparsified nested dissection algorithm

Cambier

Darve

2021

Proceedings of the Platform for Advanced Scientific Computing Conference

Self Cite

View full text Add to dashboard Cite

Sparsified nested dissection (spaND) is a fast scalable linear solver for sparse linear systems. It combines nested dissection and separator sparsification, leading to an algorithm with an O(N log N ) complexity on many problems. In this work, we study the parallelization of spaND using TaskTorrent, a lightweight, distributed, task-based runtime in C++. This leads to a distributed version of spaND using a task-based runtime system. We explain how to adapt spaND's partitioning for parallel execution, how to increase concurrency using a simultaneous sparsification algorithm, and how to express the DAG using TaskTorrent. We then benchmark spaND on a few large problems. spaND exhibits good strong and weak scalings, efficiently using up to 9,000 cores when ranks grow slowly with the problem size. CCS CONCEPTS• Mathematics of computing → Computations on matrices; • Computing methodologies → Parallel programming languages.

show abstract

“…Different works propose efficient parallel solvers of sparse linear systems: novel strategies and related performance figures are compared with wellknown software packages, together with the discussion of theoretical complexity and accuracy, and/or the use of specific parallel paradigms, run-time systems and accelerators. In [13], a parallel hierarchical solver is proposed and compared with the SuperLU performances. In [39], the performances of a novel multi-frontal solver are discussed and compared with MUMPS [5] and SuperLU.…”

Section: Related Work Novelty and Main Contributionsmentioning

confidence: 99%

Evaluating Accuracy and Efficiency of HPC Solvers for Sparse Linear Systems with Applications to PDEs

Galizia¹,

Cammarasana²,

Clematis³

et al. 2022

Preprint

View full text Add to dashboard Cite

Partial Differential Equations (PDEs) describe several problems relevant to many fields of applied sciences, and their discrete counterparts typically involve the solution of sparse linear systems. In this context, we focus on the analysis of the computational aspects related to the solution of large and sparse linear systems with HPC solvers, by considering the performances of direct and iterative solvers in terms of computational efficiency, scalability, and numerical accuracy. Our aim is to identify the main criteria to support application-domain specialists in the selection of the most suitable solvers, according to the application requirements and available resources. To this end, we discuss how the numerical solver is affected by the regular/irregular discretisation of the input domain, the discretisation of the input PDE with piecewise linear or polynomial basis functions, which generally result in a higher/lower sparsity of the coefficient matrix, and the choice of different initial conditions, which are associated with linear systems with multiple right-hand side terms. Finally, our analysis is independent of the characteristics of the underlying computational architectures, and provides a methodological approach that can be applied to different classes of PDEs or with approximation problems.

show abstract

A distributed-memory hierarchical solver for general sparse linear systems

Cited by 23 publications

References 41 publications

Parallelization of the inverse fast multipole method with an application to boundary element method

Parallelization of the inverse fast multipole method with an application to boundary element method

A task-based distributed parallel sparsified nested dissection algorithm

Evaluating Accuracy and Efficiency of HPC Solvers for Sparse Linear Systems with Applications to PDEs

Contact Info

Product

Resources

About