2018
DOI: 10.1007/s11227-018-2460-0
|View full text |Cite
|
Sign up to set email alerts
|

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Abstract: This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3+1)D decomposition. The method allows a flexible management of the trade-off between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…As a result, the relatively low operational intensity of each MPDATA kernel [4] is not high enough to efficiently utilize the resources of modern processors. In our works [4], [11], [21], [35], [38], a set of optimizations was developed to exploit resources of multicore ccNUMA/SMP systems more efficiently. The resulting parallelization methodology consists of the following parametric optimization steps: [21] -this step explores spatial blocking across the different kernels, employing overlapped tiling with redundant computations, while all kernels are grouped into five packages using loop fusion.…”
Section: Parallelization Methodology For Mpdata Code On Shared Memory Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, the relatively low operational intensity of each MPDATA kernel [4] is not high enough to efficiently utilize the resources of modern processors. In our works [4], [11], [21], [35], [38], a set of optimizations was developed to exploit resources of multicore ccNUMA/SMP systems more efficiently. The resulting parallelization methodology consists of the following parametric optimization steps: [21] -this step explores spatial blocking across the different kernels, employing overlapped tiling with redundant computations, while all kernels are grouped into five packages using loop fusion.…”
Section: Parallelization Methodology For Mpdata Code On Shared Memory Systemsmentioning
confidence: 99%
“…Through this die, a given CCD can communicate with other CCDs and the main memory, as well as with external devices connected by the PCIe bus. As a result, the EPYC 7742 CPU can provide one NUMA domain for a single processor, which is equivalent to the NUMA layout offered by current Intel Xeon CPUs [35]. This mode is known as NPS1 [5].…”
Section: Related Workmentioning
confidence: 99%
“…The next optimization step (version C) fits perfectly into multi-socket architectures [36], [42]. As shown in Fig.…”
Section: Energy/power and Performance Comparison Formentioning
confidence: 99%
“…To alleviate the memory-bound nature of MPDATA, we developed [9], [36], [37], [38] a parallelization methodology for MPDATA heterogeneous stencil computations. It contributes to ease the memory and communication bounds, and exploits resources of multicore ccNUMA/SMP systems better.…”
Section: Mpdata Parallelizationmentioning
confidence: 99%
See 1 more Smart Citation