40th Fluid Dynamics Conference and Exhibit 2010
DOI: 10.2514/6.2010-5036
|View full text |Cite
|
Sign up to set email alerts
|

Unsteady Turbulent Simulations on a Cluster of Graphics Processors

Abstract: This paper describes the GPU accelerated MBFLO2 multi-block turbulent flow solver completely in double precision using CUDA and the latest generation of GPU processors. On a cluster of 8 Tesla C2050 "Fermi" GPUs and Intel Xeon X5550 "Nehalem" quad-core CPUs, we achieve 9x speedup over the parallel CPU solver or 70x speedup over the serial solver. High performance is obtained by optimizing the data layout on the GPU, optimizing data transfers and using asynchronous memory copies to overlap GPU execution with co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…Phillips et al [86] developed one of the first GPU solvers capable of simulating turbulence using the k-ω model, extending on the group's previous work porting portions of the existing MBFLO solver to the GPU [85]. In addition, their new solver was capable of running on a cluster of multiple CPU/GPU nodes, using a domain decomposition technique to give each node responsibility for a block of the overall domain.…”
Section: Turbulent Flowmentioning
confidence: 99%
See 3 more Smart Citations
“…Phillips et al [86] developed one of the first GPU solvers capable of simulating turbulence using the k-ω model, extending on the group's previous work porting portions of the existing MBFLO solver to the GPU [85]. In addition, their new solver was capable of running on a cluster of multiple CPU/GPU nodes, using a domain decomposition technique to give each node responsibility for a block of the overall domain.…”
Section: Turbulent Flowmentioning
confidence: 99%
“…The CPU only drove the simulation and passed information between the blocks of the domain, using MPI to transfer information between independent cluster nodes. Phillips et al [86] also improved performance by implementing a novel asynchronous memory transfer using CUDA streams; in their previous work, the GPU remained idle while the CPU transferred memory between different blocks (i.e., subdomains). Here, each block was further divided in half such that the GPU could continue to perform calculations on one half while the CPU transferred memory associated with the other half of the block; this improved performance up to 40%.…”
Section: Turbulent Flowmentioning
confidence: 99%
See 2 more Smart Citations
“…Manavski et al [43] used GPUs as an accelerator for Smith-Waterman sequence alignment. Phillips et al [44] implemented a multi-block turbulent flow solver in GPU processors.…”
Section: Implementation Of Peridynamics In Gpumentioning
confidence: 99%