2017
DOI: 10.1002/cpe.4244
|View full text |Cite
|
Sign up to set email alerts
|

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides

Abstract: The sparse triangular solve kernels, SpTRSV and SpTRSM, are important building blocks for a number of numerical linear algebra routines. Parallelizing SpTRSV and SpTRSM on today's manycore platforms, such as GPUs, is not an easy task since computing a component of the solution may depend on previously computed components, enforcing a degree of sequential processing. As a consequence, most existing work introduces a preprocessing stage to partition the components into a group of level-sets or coloursets so that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 43 publications
(25 citation statements)
references
References 44 publications
0
25
0
Order By: Relevance
“…Recently, Liu et al presented a self ‐ scheduled two‐stage GPU method for matrices in the CSC format based on a light‐weight analysis phase that avoids the constant synchronization with the CPU implied by the launching of kernels to compute the different level‐sets in the cuSparse routine. To our best knowledge, there are no other significant works that apply this type of algorithms to solve sparse triangular systems on hardware accelerators.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, Liu et al presented a self ‐ scheduled two‐stage GPU method for matrices in the CSC format based on a light‐weight analysis phase that avoids the constant synchronization with the CPU implied by the launching of kernels to compute the different level‐sets in the cuSparse routine. To our best knowledge, there are no other significant works that apply this type of algorithms to solve sparse triangular systems on hardware accelerators.…”
Section: Related Workmentioning
confidence: 99%
“…In order to keep track of this, we store an integer ready vector, which has an entry for each unknown, that is set to one if it has been solved, and is equal to zero otherwise. Unlike the work presented by Liu et al, our algorithm is especially tailored for the CSR matrix format; it avoids the usage of atomic operations and does not require a preprocessing stage.…”
Section: Proposalmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared to stochastic gradient descent (SGD) [8,9], the ALS algorithm is not only inherently parallel, but can incorporate implicit ratings [1]. Nevertheless, the ALS algorithm involves parallel sparse matrix manipulation [10] which is challenging to achieve high performance due to imbalanced workload [11,12,13], random memory access [14,15], unpredictable amount of computations [16] and task dependency [17,18,19]. This particularly holds when parallelizing and optimizing ALS on modern multi-cores and many-cores [20].…”
Section: Introductionmentioning
confidence: 99%
“…To the best of our knowledge, most existing parallel triangular solvers are targeting shared-memory machines or GPU (see [2,16,17,25] and references, therein). These solvers often rely on well-known techniques such as the level-set, color-set, or block scheduling algorithms.…”
Section: Introductionmentioning
confidence: 99%