OpenMP application experiences: Porting to accelerated nodes

Bak, Seonmyeong; Bertoni, Colleen; Boehm, Swen; Budiardja, Reuben D.; Chapman, Barbara; Doerfert, Johannes; Eisenbach, Markus; Finkel, Hal; Hernández, Óscar; Huber, Joseph; Iwasaki, Shinsuke; Kale, Vivek; Kent, Paul; Kwack, JaeHyuk; Lin, Meifeng; Łuszczek, Piotr; Luo, Ye; Pham, Buu Q.; Pophale, Swaroop; Ravikumar, K.; Sarkar, Vivek; Scogland, Thomas R. W.; Tian, Shilei; Yeung, P. K.

doi:10.1016/j.parco.2021.102856

Cited by 26 publications

(8 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared to the traditional Poisson algorithm which mostly uses principal component analysis for normal estimation, this paper proposes an improved method. Firstly, an octree is used instead of a KDtree to search the nearest neighborhood; then the normal of the point cloud is estimated by moving least squares and accelerated by OpenMP [14], and then the normal direction is adjusted consistently by a least-cost spanning tree. The traditional Poisson reconstruction algorithm is prone to generate pseudo-surfaces.…”

Section: Whole Methodsmentioning

confidence: 99%

An Improved Poisson Surface Reconstruction Algorithm based on the Boundary Constraints

Liu¹,

Wang²,

Tahir³

et al. 2023

IJACSA

View full text Add to dashboard Cite

The usage of the point cloud surface reconstruction to generate high-precision 3D models has been widely applied in various fields. In order to deal with the problems of insufficient accuracy, pseudo-surfaces and high time cost caused by the traditional surface reconstruction algorithms of the point cloud data, this paper proposes an improved Poisson surface reconstruction algorithm based on the boundary constraints. For large density point cloud data obtained from 3D laser scanning, the proposed method firstly uses an octree instead of the KD-tree to search the near neighborhood; then, it uses the Open Multi-Processing (OpenMP) to accelerate the normal estimation based on the moving least squares algorithm; meanwhile, the leastcost spanning tree is employed to adjust the consistency of the normal direction; and finally a screened Poisson algorithm with the Neumann boundary constraints is proposed to reconstruct the point cloud. Compared with the traditional methods, the experiments on three open datasets demonstrated that the proposed method can effectively reduce the generation of pseudosurfaces. The reconstruction time of the proposed algorithm is about 16% shorter than that of the traditional Poisson reconstruction algorithm, and produce better reconstruction results in the term of quantitative analysis and visual comparison.

show abstract

Section: Whole Methodsmentioning

confidence: 99%

An Improved Poisson Surface Reconstruction Algorithm based on the Boundary Constraints

Liu¹,

Wang²,

Tahir³

et al. 2023

IJACSA

View full text Add to dashboard Cite

show abstract

“…The training and outreach activity is a cross-cutting effort which is supported by resources from SOLLVE and ECP Broader Engagement, with contributions by external collaborators, notably Lawrence Berkeley National Laboratory. A number of articles have also been published as part of the SOLLVE effort [87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,100,105,71].…”

Section: Validation and Verification (Vandv)mentioning

confidence: 99%

ECP Software Technology Capability Assessment Report V3.0

Heroux

McInnes

et al. 2022

View full text Add to dashboard Cite

The Exascale Computing Project (ECP) Software Technology (ST) focus area is responsible for (1) developing critical software capabilities that will enable the successful execution of ECP applications and (2) providing key components of a productive and sustainable exascale computing ecosystem that will position the US Department of Energy (DOE) and the broader high-performance computing (HPC) community with a firm foundation for future extreme-scale computing capabilities.This ECP ST Capability Assessment Report (CAR) provides an overview and assessment of current ECP ST capabilities and activities, giving stakeholders and the broader HPC community information that can be used to assess ECP ST progress and plan their own efforts accordingly. ECP ST leaders commit to updating this document on regular basis (every 6-12 months). Highlights from this version of the report are presented here.This version of the CAR contains the following updates relative to the previous revision.• This report highlights the progress with the Extreme-scale Scientific Software Stack (E4S) efforts.In particular, this report discusses how E4S continues to gain traction as a first-class entity in the HPC ecosystem, enabling new conversations with users, facilities, vendors, other US agencies, and international partners.• The several-page summaries of each ECP Level 4 project were updated to reflect recent progress and next steps (Section 4). Of particular note are the experiences of our teams on early-access systems for Frontier.• The E4S is described further. E4S is now updated via quarterly releases. E4S is the primary integration and delivery vehicle for ECP ST capabilities (Section 2.1.1).• The ECP ST software development kit (SDK) effort further refined its groupings (Section 2.1.2).The ECP ST focus area represents the key bridge between exascale systems and the scientists developing applications that will run on those platforms. ECP ST efforts contribute to approximately 70 software products (Section 2.1.3) in six technical areas (Table 1). Since publishing the previous revision of the CAR, the team has continued to evolve the product dictionary of official product names, which enables more rigorous mapping of ECP ST deliverables to stakeholders (Section 2.1.4).Programming Models & Runtimes: In addition to developing key enhancements to MPI and OpenMP for scalable systems with accelerated node architectures, the team is working on performance portability layers (Kokkos and RAJA) and participating in OpenMP and OpenACC software design and development that will enable applications to write much of their source code without needing to provide vendor-specific implementations for each exascale system. One legacy of ECP ST efforts is anticipated to be a software stack that supports Intel and AMD accelerators in addition to NVIDIA's accelerators (Section 4.1).Development Tools: The team is enhancing existing widely used compilers (e.g., LLVM) and performance tools for next-generation platforms. Compilers are critical for heterogeneous archi...

show abstract

“…Ravikumar et al [34] performed spectral simulation of turbulent flows using their own asynchronous batched GPU-FFT. In addition, Bak et al [9] performed weak scaling of their synchronous non-batched GPU-FFT on Summit. They observed that a FFT of 3072 3 grid using 96 V100 GPUs (16 nodes) of Summit took 550 milliseconds [1].…”

Section: Comparison Of Gpu-fft With Other Fftsmentioning

confidence: 99%

“…They observed a GPU to CPU speedup of 4.7 for 12288 3 grid and a speedup of 2.9 for 18432 3 grid. Recently, Bak et al [9] measured the performance of a synchronous non-batched version of this GPU-FFT on 1024 nodes of Summit and obtained a maximum GPU to CPU speedup of 2.57 for 12228 3 grid.…”

Section: Introductionmentioning

confidence: 99%

Scalable Multi-node Fast Fourier Transform on GPUs

Verma¹,

Chatterjee²,

Garg³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present the details of our multi-node GPU-FFT library, as well its scaling on Selene HPC system. Our library employs slab decomposition for data division and MPI for communication among GPUs. We performed GPU-FFT on 1024 3 , 2048 3 , and 4096 3 grids using a maximum of 512 A100 GPUs. We observed good scaling for 4096 3 grid with 64 to 512 GPUs. We report that the timings of multicore FFT of 1536 3 grid with 196608 cores of Cray XC40 is comparable to that of GPU-FFT of 2048 3 grid with 128 GPUs. The efficiency of GPU-FFT is due to the fast computation capabilities of A100 card and efficient communication via NVlink.

show abstract

OpenMP application experiences: Porting to accelerated nodes

Cited by 26 publications

References 22 publications

An Improved Poisson Surface Reconstruction Algorithm based on the Boundary Constraints

An Improved Poisson Surface Reconstruction Algorithm based on the Boundary Constraints

ECP Software Technology Capability Assessment Report V3.0

Scalable Multi-node Fast Fourier Transform on GPUs

Contact Info

Product

Resources

About