2018
DOI: 10.1007/978-3-319-78024-5_22
|View full text |Cite
|
Sign up to set email alerts
|

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
18
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 16 publications
(20 citation statements)
references
References 8 publications
2
18
0
Order By: Relevance
“…We evaluate the scalability of both approaches, gtsvStridedBatch and cuThomasBatch, for computing multiple and independent tridiagonal systems on NVIDIA GPUs. The present work extends the previously published works [13] with additional contributions. This work includes a new approach for the cuThomasBatch, which makes use of a unified vector in order to exploit the memory hierarchy more efficiently.…”
Section: Introductionsupporting
confidence: 83%
“…We evaluate the scalability of both approaches, gtsvStridedBatch and cuThomasBatch, for computing multiple and independent tridiagonal systems on NVIDIA GPUs. The present work extends the previously published works [13] with additional contributions. This work includes a new approach for the cuThomasBatch, which makes use of a unified vector in order to exploit the memory hierarchy more efficiently.…”
Section: Introductionsupporting
confidence: 83%
“…The main contribution of this work is a novel and highly scalable implementation able to deal with multi-morphology simulations based on cuThomasBatch implementation [25]. Although in this paper the cuThomasBatch was proven to be a fast implementation for batches of full-tridiagonal systems, this is not enough to compute the sparsity found in Hines matrices.…”
Section: Related Workmentioning
confidence: 96%
“…In order to solve the above scheme in batched form on a GPU we follow the methodology of cuThomasBatch [1] with some modifications. We retain the key aspect of interleaved data layout, this means that the first row of the batch data will contain the first entry in each linear system Ax i = f i (the subscript i labels the different systems in the batch), the second row the second entry and so on.…”
Section: F Implementation On Gpumentioning
confidence: 99%
“…The starting-point for developing the batched pentadiagonal solver is an existing batched tridiagonal solver called cuThomasBatch [1], based on the Thomas Algorithm, and now part of the CUDA library as gtsvInterleavedBatch. We herein extend cuThomasBatch to accommodate pentadiagonal problems.…”
Section: Introductionmentioning
confidence: 99%