Accelerating Physical Simulations from a Multicomponent Lattice Boltzmann Method on a Single-Node Multi-GPU Architecture

Duchateau, Julien; Rousselle, François; Maquignon, Nicolas; Roussel, Gilles; Renaud, Christophe

doi:10.1109/3pgcic.2015.41

Cited by 3 publications

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is equivalent to decoupling arithmetic precision and memory precision [84,85]. As a desirable side effect, since the limiting factor regarding compute time is memory bandwidth [12][13][14][15][16][17][18][19][20][21][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][52][53][54][55]59,60,63,64,67,[86][87][88], lower precision DDFs also vastly increase performance. Such a mixed precision variant, where arithmetic is done in FP64 and DDF storage in FP32, has already been used by Refs.…”

Section: Introductionmentioning

confidence: 99%

Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats

et al. 2022

View full text Add to dashboard Cite

Fluid dynamics simulations with the lattice Boltzmann method (LBM) are very memory intensive. Alongside reduction in memory footprint, significant performance benefits can be achieved by using FP32 (single) precision compared to FP64 (double) precision, especially on GPUs. Here we evaluate the possibility to use even FP16 and posit16 (half) precision for storing fluid populations, while still carrying arithmetic operations in FP32. For this, we first show that the commonly occurring number range in the LBM is a lot smaller than the FP16 number range. Based on this observation, we develop customized 16-bit formats-based on a modified IEEE-754 and on a modified posit standard-that are specifically tailored to the needs of the LBM. We then carry out an in-depth characterization of LBM accuracy for six different test systems with increasing complexity: Poiseuille flow, Taylor-Green vortices, Karman vortex streets, lid-driven cavity, a microcapsule in shear flow (utilizing the immersed-boundary method), and, finally, the impact of a raindrop (based on a volume-of-fluid approach). We find that the difference in accuracy between FP64 and FP32 is negligible in almost all cases, and that for a large number of cases even 16-bit is sufficient. Finally, we provide a detailed performance analysis of all precision levels on a large number of hardware microarchitectures and show that significant speedup is achieved with mixed FP32/16-bit.

show abstract

Section: Introductionmentioning

confidence: 99%

Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats

et al. 2022

View full text Add to dashboard Cite

show abstract

An Out-of-Core Method for Physical Simulations on a Multi-GPU Architecture Using Lattice Boltzmann Method

Duchateau¹,

Rousselle²,

Maquignon³

et al. 2016

2016 Intl IEEE Conferences on Ubiquitous Intelligence &Amp; Computing, Advanced and Trusted Computing, Scalable Computing and C

View full text Add to dashboard Cite

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Lehmann

2022

Computation

View full text Add to dashboard Cite

I present two novel thread-safe in-place streaming schemes for the lattice Boltzmann method (LBM) on graphics processing units (GPUs), termed Esoteric Pull and Esoteric Push, that result in the LBM only requiring one copy of the density distribution functions (DDFs) instead of two, greatly reducing memory demand. These build upon the idea of the existing Esoteric Twist scheme, to stream half of the DDFs at the end of one stream-collide kernel and the remaining half at the beginning of the next and offer the same beneficial properties over the AA-Pattern scheme—reduced memory bandwidth due to implicit bounce-back boundaries and the possibility of swapping pointers between even and odd time steps. However, the streaming directions are chosen in a way that allows the algorithm to be implemented in about one tenth the amount of code, as two simple loops, and is compatible with all velocity sets and suitable for automatic code-generation. The performance of the new streaming schemes is slightly increased over Esoteric Twist due to better memory coalescence. Benchmarks across a large variety of GPUs and CPUs show that for most dedicated GPUs, performance differs only insignificantly from the One-Step Pull scheme; however, for integrated GPUs and CPUs, performance is significantly improved. The two proposed algorithms greatly facilitate modifying existing code to allow for in-place streaming, even with extensions already in place, such as was demonstrated for the Free Surface LBM implementation FluidX3D. Their simplicity, together with their ideal performance characteristics, may enable more widespread adoption of in-place streaming across LBM GPU code.

show abstract

Accelerating Physical Simulations from a Multicomponent Lattice Boltzmann Method on a Single-Node Multi-GPU Architecture

Cited by 3 publications

References 16 publications

Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats

Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats

An Out-of-Core Method for Physical Simulations on a Multi-GPU Architecture Using Lattice Boltzmann Method

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Contact Info

Product

Resources

About