Matthew Leinhauser scite author profile

Recent progress in the application of machine learning (ML) / artificial intelligence (AI) algorithms to improve EFIT equilibrium reconstruction for fusion data analysis applications is presented. A device-independent portable core equilibrium solver capable of computing or reconstructing equilibrium for different tokamaks has been created to facilitate adaptation of ML/AI algorithms. A large EFIT database comprising of DIII-D magnetic, Motional-Stark Effect (MSE), and kinetic reconstruction data has been generated for developments of EFIT Model-Order-Reduction (MOR) surrogate models to reconstruct approximate equilibrium solutions. A neural-network (NN) MOR surrogate model has been successfully trained and tested using the magnetically reconstructed datasets with encouraging results. Other progress includes developments of a Gaussian-Process (GP) Bayesian framework that can adapt its many hyperparameters to improve processing of experimental input data and a 3D perturbed equilibrium database from toroidal full magnetohydrodynamic linear response modeling using the MARS-F code for developments of 3D-MOR surrogate models.

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Leinhauser

Widera

ACM Trans. Parallel Comput.

et al. 2022

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this article, we design an instruction roofline model for AMD GPUs using AMD’s ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application’s performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the NVIDIA V100 GPU, profiling tool differences make comparing performance of these two architectures hard. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.

Performance Analysis of PIConGPU: Particle-in-Cell on GPUs using NVIDIA’s NSight Systems and NSight Compute

Leinhauser

Young

et al. 2021

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading

Kelling

Debus

et al. 2022

Hardware-Agnostic Interactive Exascale In Situ Visualization of Particle-In-Cell Simulations

Meyer

Hernandez

Pausch

et al. 2023

EZ: An efficient, charge conserving current deposition algorithm for electromagnetic particle-in-cell simulations

Steiniger

Widera

Computer Physics Communications

et al. 2023

Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading

Kelling¹,

Bastrakov²,

Debus³

et al. 2021

Preprint

HPC systems employ a growing variety of compute accelerators with different architectures and from different vendors. Large scientific applications are required to run efficiently across these systems but need to retain a single code-base in order to not stifle development. Directive-based offloading programming models set out to provide the required portability, but, to existing codes, they themselves represent yet another API to port to. Here, we present our approach of porting the GPU-accelerated particle-in-cell code PIConGPU to Open-ACC and OpenMP target by adding two new backends to its existing C++-template metaprogramming-based offloading abstraction layer alpaka and avoiding other modifications to the application code. We introduce our approach in the face of conflicts between requirements and available features in the standards as well as practical hurdles posed by immature compiler support.

Metrics and Design of an Instruction Roofline Model for AMD GPUs

Leinhauser¹,

Widera²,

Bastrakov³

et al. 2021

Preprint

Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this paper, we design an instruction roofline model for AMD GPUs using AMD's ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application's performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell (PIC) simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the NVIDIA V100 GPU, profiling tool differences make comparing performance of these two architectures hard. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.