2021
DOI: 10.1007/978-3-030-71593-9_14
View full text |Buy / Rent full text
|
Sign up to set email alerts
|

Abstract: We show that recently developed divide and conquer parallel algorithm for solving tridiagonal Toeplitz systems of linear equations can be easily and efficiently implemented for a variety of modern multicore and GPU architectures, as well as hybrid systems. Our new portable implementation that uses OpenACC can be executed on both CPU-based and GPU-accelerated systems. More sophisticated variants of the implementation are suitable for systems with multiple GPUs and it can use CPU and GPU c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(10 citation statements)
references
References 12 publications
(14 reference statements)
0
10
0
Order By: Relevance
“…Our OpenMP implementation of the algorithm achieved satisfying speedup on Intel Xeon CPUs and Intel Xeon Phi. In our next paper, 13 we showed further improvements in the implementation of the algorithm using more sophisticated vectorization techniques. We also used OpenACC, a standard for accelerated computing, 14,15 which introduces compiler directives for offloading selected computations from host to attached accelerator devices.…”
Section: Introductionmentioning
confidence: 90%
See 4 more Smart Citations
“…Our OpenMP implementation of the algorithm achieved satisfying speedup on Intel Xeon CPUs and Intel Xeon Phi. In our next paper, 13 we showed further improvements in the implementation of the algorithm using more sophisticated vectorization techniques. We also used OpenACC, a standard for accelerated computing, 14,15 which introduces compiler directives for offloading selected computations from host to attached accelerator devices.…”
Section: Introductionmentioning
confidence: 90%
“…Moreover, it cannot be automatically vectorized and parallelized. To obtain an efficient vectorizable parallel algorithm for solving () let us consider the following divide and conquer method 7,13 . First, we choose two integers r,s>1, rs=n, and rewrite L in the following block form: L=LsBLsBLs,Ls=1α1α1,B=00α00...00. Let us define ek=(0,,0k,1,0,,0)Ts,k=0,,s1, and split d, z into vectors …”
Section: Parallel Algorithmsmentioning
confidence: 99%
See 3 more Smart Citations