Roman Wyrzykowski scite author profile

2014

Parallel Computing

Adaptation of fluid model EULAG to graphics processing unit architecture

Concurrency and Computation

Ciznicki

Rosa

et al. 2014

SUMMARYThe goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok (EULAG) to future graphics processing units (GPU) platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm (MPDATA) and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual (GCR) are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply-periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency.

show abstract

Correlation of Performance Optimizations and Energy Consumption for Stencil-Based Application on Intel Xeon Scalable Processors

IEEE Trans. Parallel Distrib. Syst.

Olas

et al. 2020

This article provides a comprehensive study of the impact of performance optimizations on the energy efficiency of a real-world CFD application called MPDATA, as well as an insightful analysis of performance-energy interaction of these optimizations with the underlying hardware that represents the first generation of Intel Xeon Scalable processors. Considering the MPDATA iterative application as a use case, we explore the fundamentals of energy and performance analysis for a memory-bound application when exposed to a set of optimization steps that increase the application performance, by improving the operational intensity of code and utilizing resources more efficiently. It is shown that for memory-bound applications, optimizing toward high performance could be a powerful strategy for improving the energy efficiency as well. In fact, for the considered performance optimizations, the energy gain is correlated with the performance gain but with varying degrees. As a result, these optimizations allow improving both performance and energy consumption radically, up to about 10.9 and 8.8 times, respectively. The impact of the Intel AVX-512 SIMD extension on the energy consumption and performance is demonstrated. Also, we discover limitations on the usability of CPU frequency scaling as a tool for balancing energy savings with admissible performance losses.

show abstract

Towards Efficient Decomposition and Parallelization of MPDATA on Hybrid CPU-GPU Cluster

et al. 2014

Assessment of offload-based programming environments for hybrid CPU–MIC platforms in numerical modeling of solidification

Halbiniak

Simulation Modelling Practice and Theory

et al. 2018

Performance Analysis for Stencil-Based 3D MPDATA Algorithm on GPU Architecture

2014

Parallelization of 3D MPDATA Algorithm Using Many Graphics Processors

2015