2019 18th International Symposium on Parallel and Distributed Computing (ISPDC) 2019
DOI: 10.1109/ispdc.2019.000-2
|View full text |Cite
|
Sign up to set email alerts
|

Toward Full GPU Implementation of Fluid-Structure Interaction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 6 publications
0
10
0
Order By: Relevance
“…This is feasible since the most refined blood cell model has fewer points (discretized membrane) than the maximum allowed number of threads per block (hardware constraint). Keeping all points of a cell within a CUDA-block allows us to compute the entire solver time step in one CUDA-kernel call, and make good use of cache and shared memory [ 44 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is feasible since the most refined blood cell model has fewer points (discretized membrane) than the maximum allowed number of threads per block (hardware constraint). Keeping all points of a cell within a CUDA-block allows us to compute the entire solver time step in one CUDA-kernel call, and make good use of cache and shared memory [ 44 ].…”
Section: Methodsmentioning
confidence: 99%
“…Another counter-argument is that some numerical methods such as the IBM have a large and complex memory footprint that renders them less GPU friendly. An earlier attempt [44] to port the whole framework on GPUs could not serve as a justification to move in this direction with the main bottleneck being the irregular memory patterns of the IBM, even if all computations were performed in just one GPU (data locality advantage). Figure 5 presents the execution time per iteration for the hybrid (CPU/GPU) version and the CPU-only version.…”
Section: Performance Analysismentioning
confidence: 99%
“…Nvidia GPUs , CPUs and Nvidia GPUs [16][17][18][19][20][21][22][23][24][25][26] or mobile SoCs [116,117] -only few use OpenCL [5][6][7][8][9][10][11][12][13][14][15]. With FluidX3D also being implemented in OpenCL, we are able to benchmark our code across a large variety of hardware, from the worlds fastest data-center GPUs over gaming GPUs and CPUs to even the GPUs of mobile phone ARM SoCs.…”
Section: Memory and Performance Comparisonmentioning
confidence: 99%
“…Nevertheless, only few papers [17,32,33,49,57,61] provide some comparison on how floating-point formats affect the accuracy of the LBM and mostly find only insignificant differences between FP64 and FP32 except at very low velocity and where floating-point round-off leads to spontaneous symmetry breaking. Besides the question of accuracy, a quantitative performance comparison across different hardware microarchitectures is missing as the vast majority of LBM software is either written only for CPUs [62][63][64][65][66][67][68][69][70][71][72][73][74] or only for Nvidia GPUs or CPUs and Nvidia GPUs [16][17][18][19][20][21][22][23][24][25][26].…”
Section: Introductionmentioning
confidence: 99%
“…Yao et al (2017) accelerated the multirelaxation LBM model, calculated 896 × 768 × 4 fluid flow by six GPUs, and obtained speedup of 95 times. Bény et al (2019) used CUDA to simulate fluid-structure interaction problems. The CPU code used for comparison was executed using the high-performance LBM library Palabos (Latt, 2009).…”
Section: Introductionmentioning
confidence: 99%