A GPU-accelerated implicit meshless method for compressible flows

Zhang, Jia-Le; Ma, Zhihua; Chen, Hongquan; Cao, Cheng

doi:10.1016/j.jcp.2018.01.037

Cited by 21 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Emelyanov et al [21] discussed the popular CFD benchmark solution of the flow over a smooth flat plate on a GPU with various meshes, and the speedup reached more than 46 times. Zhang et al [22] performed an implicit meshless method for compressible flow on an NVIDIA GTX TITAN GPU, and the solution agrees well with experimental results.…”

Section: Introductionmentioning

confidence: 64%

A Multi‐GPU Parallel Algorithm in Hypersonic Flow Computations

Lai

Tian

et al. 2019

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Computational fluid dynamics (CFD) plays an important role in the optimal design of aircraft and the analysis of complex flow mechanisms in the aerospace domain. The graphics processing unit (GPU) has a strong floating-point operation capability and a high memory bandwidth in data parallelism, which brings great opportunities for CFD. A cell-centred finite volume method is applied to solve three-dimensional compressible Navier–Stokes equations on structured meshes with an upwind AUSM+UP numerical scheme for space discretization, and four-stage Runge–Kutta method is used for time discretization. Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for GPUs, which reduces the complexity of programming. The main purpose of this paper is to design an extremely efficient multi-GPU parallel algorithm based on MPI+CUDA to study the hypersonic flow characteristics. Solutions of hypersonic flow over an aerospace plane model are provided at different Mach numbers. The agreement between numerical computations and experimental measurements is favourable. Acceleration performance of the parallel platform is studied with single GPU, two GPUs, and four GPUs. For single GPU implementation, the speedup reaches 63 for the coarser mesh and 78 for the finest mesh. GPUs are better suited for compute-intensive tasks than traditional CPUs. For multi-GPU parallelization, the speedup of four GPUs reaches 77 for the coarser mesh and 147 for the finest mesh; this is far greater than the acceleration achieved by single GPU and two GPUs. It is prospective to apply the multi-GPU parallel algorithm to hypersonic flow computations.

show abstract

Section: Introductionmentioning

confidence: 64%

A Multi‐GPU Parallel Algorithm in Hypersonic Flow Computations

Lai

Tian

et al. 2019

Mathematical Problems in Engineering

View full text Add to dashboard Cite

show abstract

“…24) present more inefficient when the computation domain gets larger, especially for time-marching kernel. Consequently, for the same scheme, speedups keep declining when mesh sizes increase for both method 3 and optimization of reduction, while speedups should improve with the increasing mesh size under certain [9,16,39]. The difference mainly comes from limited performance of time-marching kernel, whose increased multiple of computation time exceeds that of meshes, leading to the significant decreasing performance.…”

Section: D Flow Past a Forward-facing Stepmentioning

confidence: 99%

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Jiang

Zhou

et al. 2019

J Braz. Soc. Mech. Sci. Eng.

View full text Add to dashboard Cite

With the increasing requirement of high computational power in computational fluid dynamics (CFD) field, the graphic processing units (GPUs) with great floating-point computing capability play more important roles. This work explores the porting of an Euler solver from central processing units (CPUs) to three different CPU/GPU heterogeneous hardware platforms using MUSCL and NND schemes, and then the computational acceleration of one-dimensional (1D) Riemann problem and two-dimensional (2D) flow past a forward-facing step is investigated. Based on hardware structures, memory models and programming methods, the working manner of heterogeneous systems was firstly introduced in this paper. Subsequently, three different heterogeneous methods employed in the current study were presented in detail, while porting all parts of the solver loop to GPU possessed the best performance among them. Several optimization strategies suitable for the solver were adopted to achieve substantial execution speedups, while using shared memory on GPU was relatively rarely reported in CFD literature. Finally, the simulation of 1D Riemann verified the reliability of the modified codes on GPU, demonstrating strong ability in capturing discontinuities of both schemes. The two cases with their 1D computational domains discretized into 10,000 cells both realized a speedup exceeding 25, compared to that executed on a single-core CPU. In simulation of the 2D step flow, we came to the highest speedups of 260 for MUSCL scheme with 800 × 400 mesh size and 144 for NND scheme with 400 × 200 computational domain, respectively.

show abstract

“…As described in Algorithm 1, a mass of independent arithmetic operations associated with the dominated or non-dominated identifications (see Algorithm 1, line 6) are proved to be time-consuming. Fortunately, such tasks are mostly weak-dependent compute-intensive and are very suitable for GPU parallel architecture [39][40][41]. Therefore, such a kind of computation is implemented on the GPU to achieve acceleration.…”

Section: Gpu-accelerated Infill Criterion For the Moego Algorithmmentioning

confidence: 99%

GPU-Accelerated Infill Criterion for Multi-Objective Efficient Global Optimization Algorithm and Its Applications

Zhang

Chen

et al. 2022

Applied Sciences

View full text Add to dashboard Cite

In this work, a novel multi-objective efficient global optimization (EGO) algorithm, namely GMOEGO, is presented by proposing an approach of available threads’ multi-objective infill criterion. The work applies the outstanding hypervolume-based expected improvement criterion to enhance the Pareto solutions in view of the accuracy and their distribution on the Pareto front, and the values of sophisticated hypervolume improvement (HVI) are technically approximated by counting the Monte Carlo sampling points under the modern GPU (graphics processing unit) architecture. As compared with traditional methods, such as slice-based hypervolume integration, the programing complexity of the present approach is greatly reduced due to such counting-like simple operations. That is, the calculation of the sophisticated HVI, which has proven to be the most time-consuming part with many objectives, can be light in programed implementation. Meanwhile, the time consumption of massive computing associated with such Monte Carlo-based HVI approximation (MCHVI) is greatly alleviated by parallelizing in the GPU. A set of mathematical function cases and a real engineering airfoil shape optimization problem that appeared in the literature are taken to validate the proposed approach. All the results show that, less time-consuming, up to around 13.734 times the speedup is achieved when appropriate Pareto solutions are captured.

show abstract

A GPU-accelerated implicit meshless method for compressible flows

Cited by 21 publications

References 31 publications

A Multi‐GPU Parallel Algorithm in Hypersonic Flow Computations

A Multi‐GPU Parallel Algorithm in Hypersonic Flow Computations

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

GPU-Accelerated Infill Criterion for Multi-Objective Efficient Global Optimization Algorithm and Its Applications

Contact Info

Product

Resources

About