2020
DOI: 10.1134/s1995080220080077
|View full text |Cite
|
Sign up to set email alerts
|

Acceleration of NOISEtte Code for Scale-Resolving Supercomputer Simulations of Turbulent Flows

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…In summary, the "business lunch" version with combined functions and mixed accuracy is about twice as fast on CPUs as the base version. The use of mixed accuracy speeds up the code about 1.6 -1.7 times and saves memory nearly twice with no effect on accuracy of results, as shown in [12]. The OpenCL version of the code gives significant acceleration, one GPU performs like 7-8 modern multicore CPUs, which is in good agreement with the memory bandwidth ratio between CPU and GPU.…”
Section: Discussionmentioning
confidence: 60%
See 2 more Smart Citations
“…In summary, the "business lunch" version with combined functions and mixed accuracy is about twice as fast on CPUs as the base version. The use of mixed accuracy speeds up the code about 1.6 -1.7 times and saves memory nearly twice with no effect on accuracy of results, as shown in [12]. The OpenCL version of the code gives significant acceleration, one GPU performs like 7-8 modern multicore CPUs, which is in good agreement with the memory bandwidth ratio between CPU and GPU.…”
Section: Discussionmentioning
confidence: 60%
“…Furthermore, since GPU memory is very limited, we developed a simplified version of the viscous fluxes calculation method by reducing the number of coefficients and using mixed single and double precision floating point formats. Details about this new method to compute viscous fluxes much cheaper can be found in [12]. In resource-intensive applications, this "business-lunch" configuration works about 15-20% faster than the baseline full version, called "playground".…”
Section: Simplification and Improvement Of Performancementioning
confidence: 99%
See 1 more Smart Citation
“…The heterogeneous parallel algorithm is implemented in the NOISEtte code [8]. Further details on parallel algorithm, adaptation of the numerical algorithm and software implementation to GPU computing can be found in [7][8][9]. Examples of parallel speedups are shown in Fig.…”
Section: Parallel Computingmentioning
confidence: 99%
“…minimization of work-item tasks to increase occupancy of compute units; mixed single and double floating point precision (single precision in some heavy arrays of discrete operator coefficients and in the linear solver) to reduce memory consumption and memory traffic, of course without affecting the accuracy of the results [18]; reordering of mesh objects (block Cuthill -McKee, lexicographical sorting) to improve memory access locality; new numerical algorithms with reduced memory consumption [18,19].…”
Section: Parallel Implementationmentioning
confidence: 99%