2021 **Abstract:** We show that recently developed
divide and conquer
parallel algorithm for solving tridiagonal Toeplitz systems of linear equations can be easily and efficiently implemented for a variety of modern multicore and GPU architectures, as well as hybrid systems. Our new portable implementation that uses OpenACC can be executed on both CPU-based and GPU-accelerated systems. More sophisticated variants of the implementation are suitable for systems with multiple GPUs and it can use CPU and GPU c…

Help me understand this report

Search citation statements

Paper Sections

Select...

3

1

1

Citation Types

0

10

0

Year Published

2021

2021

Publication Types

Select...

1

Relationship

1

0

Authors

Journals

(10 citation statements)

(14 reference statements)

0

10

0

“…Our OpenMP implementation of the algorithm achieved satisfying speedup on Intel Xeon CPUs and Intel Xeon Phi. In our next paper, 13 we showed further improvements in the implementation of the algorithm using more sophisticated vectorization techniques. We also used OpenACC, a standard for accelerated computing, 14,15 which introduces compiler directives for offloading selected computations from host to attached accelerator devices.…”

confidence: 90%

“…Our OpenMP implementation of the algorithm achieved satisfying speedup on Intel Xeon CPUs and Intel Xeon Phi. In our next paper, 13 we showed further improvements in the implementation of the algorithm using more sophisticated vectorization techniques. We also used OpenACC, a standard for accelerated computing, 14,15 which introduces compiler directives for offloading selected computations from host to attached accelerator devices.…”

confidence: 90%

“…Moreover, it cannot be automatically vectorized and parallelized. To obtain an efficient vectorizable parallel algorithm for solving () let us consider the following divide and conquer method 7,13 . First, we choose two integers $r,s>1$, $rs=n$, and rewrite $L$ in the following block form: $$L=\left[\begin{array}{cccc}{L}_{s}& & & \\ B& {L}_{s}& & \\ & \ddots & \ddots & \\ & & B& {L}_{s}\end{array}\right],\phantom{\rule{2em}{0ex}}{L}_{s}=\left[\begin{array}{cccc}1& & & \\ \alpha & 1& & \\ & \ddots & \ddots & \\ & & \alpha & 1\end{array}\right],\phantom{\rule{2em}{0ex}}B=\left[\begin{array}{cccc}0& \dots & 0& \alpha \\ \vdots & & 0& 0\\ \vdots & ...& & \vdots \\ 0& \dots & \dots & 0\end{array}\right].$$ Let us define ${\mathbf{\text{e}}}_{k}={(\underset{k}{\underset{\u23df}{0,\dots ,0}},1,0,\dots ,0)}^{T}\in {\mathbb{R}}^{s},\phantom{\rule{0.3em}{0ex}}k=0,\dots ,s-1,$ and split $\mathbf{d}$, $\mathbf{z}$ into vectors $$…”

confidence: 99%