Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Jacobsen, Dana A.; Şenocak, İnanç

doi:10.2514/6.2011-947

Cited by 10 publications

(4 citation statements)

References 28 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CPU (central processing unit) + Intel Xeon-phi co-processors were used for heterogeneous parallel computing employing MPI, OpenMP, and offload programming model. CUDA, OpenMP, and hybrid OpenMP + CUDA-based parallelization for in-house CFD codes is also reported in the literature (Simmendinger and Kügeler 2010;Kafui et al 2011;Xu et al 2014;Jacobsen and Senocak 2011). Review articles by Afzal et al (2017) and Pinto et al (2016Pinto et al ( , 2017 provide a detailed insight into parallel computing strategies for different CFD applications.…”

Section: Introductionmentioning

confidence: 95%

Parallel performance analysis of coupled heat and fluid flow in parallel plate channel using CUDA

2020

View full text Add to dashboard Cite

The heat transfer analysis coupled with fluid flow is important in many real-world application areas varying from micro-channels to spacecraft's. Numerical prediction of thermal and fluid flow situation has become very common method using any computational fluid dynamics software or by developing in-house codes. One of the major issues pertinent to numerical analysis lies with immense computational time required for repeated analysis. In this article, technique applied for parallelization of in-house developed generic code using CUDA and OpenMP paradigm is discussed. The parallelized finite-volume method (FVM)-based code for analysis of various problems is analyzed for different boundary conditions. Two GPUs (graphical processing units) are used for parallel execution. Out of four functions in the code (U, V , P, and T), only P function is parallelized using CUDA as it consumes 91% of computational time and the rest functions are parallelized using OpenMP. Parallel performance analysis is carried out for 400, 625, and 900 threads launched from host for parallel execution. Improvement in speedup using CUDA compared with speedup using complete OpenMP parallelization on different computing machines is also provided. Parallel efficiency of the FVM code for different grid size, Reynolds number, internal flow, and external flow is also carried out. It is found that the GPU provides immense speedup and outperforms OpenMP largely. Parallel execution on GPU gives results in a quite acceptable amount of time. The parallel efficiency is found to be close to 90% in internal flow and 10% for external flow.

show abstract

Section: Introductionmentioning

confidence: 95%

Parallel performance analysis of coupled heat and fluid flow in parallel plate channel using CUDA

2020

View full text Add to dashboard Cite

show abstract

“…In this approach, each MPI process solves the problem on a sub-domain using the GPU it is associated with. Such approaches can be found in Komatitsch et al (2010); Jacobsen and Senocak (2011); Lai et al (2019); Viñas et al (2013); Turchetto et al (2020). This paper describes a methodology for porting a finite volume solver for the SWE on a multi-GPU architecture.…”

Section: Introductionmentioning

confidence: 99%

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Delmas

Soulaïmani

2022

Computer Physics Communications

View full text Add to dashboard Cite

“…Since appearance, GPU has shown distinctive prospects across a large range of fields in practice, for instance, artificial intelligence, deep learning, molecular dynamics, quantum chemistry, high-energy physics, and likewise, in CFD applications. Researchers have made the technology of extension mature from single to several GPUs and even clusters [6][7][8], including the different speedups between explicit and implicit schemes [9], the variance among structured, unstructured and hybrid grids [10,11], the influence of single and double precision [12], as well as high-order schemes and high-fidelity methods attracting increasing attention [13][14][15][16][17][18]. Contributed by hardware's development, GPU has possessed the power of simulating more complicated problems, such as turbulence, where LES was studies earlier [19,20] but DNS was still in the infancy [21][22][23][24].…”

Section: Introductionmentioning

confidence: 99%

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Jiang

Zhou

et al. 2019

J Braz. Soc. Mech. Sci. Eng.

View full text Add to dashboard Cite

With the increasing requirement of high computational power in computational fluid dynamics (CFD) field, the graphic processing units (GPUs) with great floating-point computing capability play more important roles. This work explores the porting of an Euler solver from central processing units (CPUs) to three different CPU/GPU heterogeneous hardware platforms using MUSCL and NND schemes, and then the computational acceleration of one-dimensional (1D) Riemann problem and two-dimensional (2D) flow past a forward-facing step is investigated. Based on hardware structures, memory models and programming methods, the working manner of heterogeneous systems was firstly introduced in this paper. Subsequently, three different heterogeneous methods employed in the current study were presented in detail, while porting all parts of the solver loop to GPU possessed the best performance among them. Several optimization strategies suitable for the solver were adopted to achieve substantial execution speedups, while using shared memory on GPU was relatively rarely reported in CFD literature. Finally, the simulation of 1D Riemann verified the reliability of the modified codes on GPU, demonstrating strong ability in capturing discontinuities of both schemes. The two cases with their 1D computational domains discretized into 10,000 cells both realized a speedup exceeding 25, compared to that executed on a single-core CPU. In simulation of the 2D step flow, we came to the highest speedups of 260 for MUSCL scheme with 800 × 400 mesh size and 144 for NND scheme with 400 × 200 computational domain, respectively.

show abstract

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Cited by 10 publications

References 28 publications

Parallel performance analysis of coupled heat and fluid flow in parallel plate channel using CUDA

Parallel performance analysis of coupled heat and fluid flow in parallel plate channel using CUDA

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Contact Info

Product

Resources

About