Hanli Bai scite author profile

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely "one-thread-one-point" and "one-thread-one-line", to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a trilevel hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

show abstract

Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi‐core and Many‐Core Architectures

Deng

Zhao

Bai

et al. 2018

Chin. j. electron.

View full text Add to dashboard Cite

Design and development of CEWES software for complex environment wind engineering simulation

Fan

Chen

et al. 2021

IOP Conf. Ser.: Earth Environ. Sci.

View full text Add to dashboard Cite

The design and development of the complex environment wind engineering simulation software CEWES was carried out, relying on the National Numerical Wind Tunnel Project (NNW). First, based on the characteristics of the physical problem that the software aims to solve, the requirements for the development of complex environment wind engineering simulation software are proposed, and three main modules of the software will be developed: structured grid flow field solver, unstructured grid flow field solvers modeling module of complex terrain and surface. Subsequently, the appropriate mathematical and physical model and numerical solution algorithm are selected for the flow field solver. The CEWES software uses the finite volume method for discretization with second-order accuracy, solves the RANS equations based on the SIMPLE algorithm, uses the k-ε turbulence model to solve the turbulence, and supports large-scale parallelism calculation. Third, the software design was carried out in accordance with the requirements of the CFD solution process and the modular program, focusing on the program architecture, data structure and subroutine interface design, and coding implementation based on the detailed design. Finally, the CEWES software was tested with typical examples. The test results of the calculation examples show that the software calculation results have good accuracy and large-scale parallel computing capabilities, and are suitable for wind engineering simulations in complex terrain environments.

show abstract

Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC

Deng¹,

Wang²,

Bai

et al. 2015

View full text Add to dashboard Cite

As a conservative, high-order accurate, shock-capturing method, weighted essentially non-oscillatory (WENO) scheme have been widely used to effectively resolve complicated flow structures in computational fluid dynamics (CFD) simulations. However, using a high-order WENO scheme can be highly time-consuming, which greatly limits the CFD application's performance efficiency. In this paper, we present various parallel strategies base on the latest many-core platform such as NVIDIA Fermi GPU, NVIDIA Kepler GPU and Intel MIC coprocessor to accelerate a high-order WENO scheme. Comparison analysis of the two generations GPUs between Fermi and Kepler, and cross-platform performance analysis (focusing on Kepler GPU and MIC) are also detailed discussed. The experiments show that the Kepler GPU offers a clear advantage in contrast to the previous Fermi GPU maintaining exactly the same source code. Furthermore, while Kepler GPU can be several times faster than MIC without utilizing the increasingly available SIMD computing power on Vector Processing Unit (VPU), MIC can provide the computing capability equivalent to Kepler GPU when VPU is utilized. Our implementations and optimization techniques can serve as case studies for paralleling high-order schemes on many-core architectures.

show abstract

An Experimental Study on Attribute Validity of Code Quality Evaluation Model

et al. 2022

View full text Add to dashboard Cite

Regarding the practicality of the quality evaluation model, the lack of quantitative experimental evaluation affects the effective use of the quality model, and also a lack of effective guidance for choosing the model. Aiming at this problem, based on the sensitivity of the quality evaluation model to code defects, a machine learningbased quality evaluation attribute validity verification method is proposed. This method conducts comparative experiments by controlling variables. First, extract the basic metric elements; then, convert them into quality attributes of the software; finally, to verify the quality evaluation model and the effectiveness of medium quality attributes, this paper compares machine learning methods based on quality attributes with those based on text features, and conducts experimental evaluation in two data sets. The result shows that the effectiveness of quality attributes under control variables is better, and leads by 15% in AdaBoostClassifier; when the text feature extraction method is increased to 50 -150 dimensions, the performance of the text feature in the four machine learning algorithms overtakes the quality attributes; but when the peak is reached, quality attributes are more stable. This also provides a direction for the optimization of the quality model and the use of quality assessment in different situations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hanli Bai

Kepler GPU vs. Xeon Phi: Performance case study with a high-order CFD application

Runtime prediction of high-performance computing jobs based on ensemble learning

Evaluating Multi-core and Many-Core Architectures through Accelerating an Alternating Direction Implicit CFD Solver

Cpu/Gpu Computing for an Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi‐core and Many‐Core Architectures

Design and development of CEWES software for complex environment wind engineering simulation

Parallelizing a high-order WENO scheme for complicated flow structures on GPU and MIC

An Experimental Study on Attribute Validity of Code Quality Evaluation Model

Contact Info

Product

Resources

About