Comparison of implementations of the lattice-Boltzmann method

Mattila, Keijo; Hyväluoma, Jari; Timonen, J.; Rossi, Tuomo

doi:10.1016/j.camwa.2007.08.001

Cited by 54 publications

(27 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is "only" 2 times slower than native optimized implementations on almost similar equipments (AMD R Opteron TM 2Ghz and Intel R Xeon TM 3.2Ghz) [12]. We therefore expect to reach at least similar results if applying our optimization technic to a native implementation.…”

Section: Comparison Of Simple and Optimized Algorithmsmentioning

confidence: 63%

Lattice Boltzmann Simulation Code Optimization Based on Constant-time Circular Array Shifting

Dethier¹,

Marneffe²,

Marchot³

2011

Procedia Computer Science

View full text Add to dashboard Cite

Lattice Boltzmann (LB) methods are a class of Computational Fluid Dynamics (CFD) methods for fluid flow simulation. LB simulation codes have high requirements regarding memory and computational power: they may involve the update of several millions of floating point values thousands of times and therefore require several gigabytes of available memory and run for several days. Optimized implementations of LB methods minimize these requirements.An existing method based on a particular data layout and an associated implementation implying a constant time array shifting allows to reduce the execution time of LB simulations and almost minimize memory usage when compared to a naive implementation.In this paper, we show that this method can be further improved, both in memory usage and performances by slightly modifying the data layout and by using blocking in order to enhance data locality.

show abstract

Section: Comparison Of Simple and Optimized Algorithmsmentioning

confidence: 63%

Lattice Boltzmann Simulation Code Optimization Based on Constant-time Circular Array Shifting

Dethier¹,

Marneffe²,

Marchot³

2011

Procedia Computer Science

View full text Add to dashboard Cite

show abstract

“…This paper also applies the techniques of blocking for repeated iterations and eliminating unnecessary dependencies and branches, resulting in performance improvements of 1.2-1.3 relative to the original code. A similar effort (Mattila et al, 2008) looked at memory addressing schemes and data layouts in LBM codes across multiple platforms and evaluated the results in terms of computational speed, cache performance, and memory consumption, with the conclusion that the optimal approach is dependent on the particular case and the desired trade-off between memory consumption and performance. Wellein et al (2005) also considered data layouts as well as other optimization strategies such as blocking, again across multiple platforms.…”

Section: Introductionmentioning

confidence: 99%

Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study

Hauser¹,

LeBeau

2010

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

With the current shift of increasing the computational power of a processor by including multiple cores instead of increasing the clock frequency, consideration of computational efficiency is gaining increased importance for computational fluid dynamics codes. This is especially critical for applications that require high throughput. For example, applying computational fluid dynamics simulations to multi-disciplinary design optimization requires a large number of similar simulations with different input parameters. Therefore, a reduction in the runtime of the code can lead to large reduction in the design process. In our case study, a two-dimensional, block-structured computational fluid dynamics code was optimized for performance on machines with hierarchical memory systems. This paper illustrates the techniques applied to transform an initial version of the code to an optimized version that yielded performance improvements of 10% for very small cases to about 50% for large test cases that did not fit into the cache memory of the target processor. A detailed performance analysis of the code starting at the global level down to subroutines and data structures is presented in this paper. The performance improvements can be explained through a reduction of cache misses in all levels of the memory hierarchy. The L1 cache misses were reduced by about 50%, the L2 cache misses by about 80% and the translation lookaside buffer misses by about 90% for the optimized version of the code. The code performance was also evaluated for multi-core processors, where efficiency is especially important when several instances of an application are running simultaneously. In this case, the most optimized version, a blocked version of the optimized code, more effectively maintained efficiency as more cores were activated compared to the unblocked version. This illustrates that optimizing cache performance may be increasingly important as the number of cores per processor continues to rise.

show abstract

“…On the other hand, LBM contains two distinct steps: streaming and collision. In streaming step, data are coupled to and from adjacent lattice nodes, while in collision step, data are usually independent of the underlying lattice type and computations are performed in this step (Mattila et al 2008). …”

Section: Introductionmentioning

confidence: 99%

A New Approach to Reduce Memory Consumption in Lattice Boltzmann Method on GPU

Sheida¹,

Taeibi-Rahni²,

Esfahanian³

2017

JAFM

View full text Add to dashboard Cite

Several efforts have been performed to improve LBM defects related to its computational performance. In this work, a new algorithm has been introduced to reduce memory consumption. In the past, most LBM developers have not paid enough attention to retain LBM simplicity in their modified version, while it has been one of the main concerns in developing of the present algorithm. Note, there is also a deficiency in our new algorithm. Besides the memory reduction, because of high memory call back from the main memory, some computational efficiency reduction occurs. To overcome this difficulty, an optimization approach has been introduced, which has recovered this efficiency to the original two-steps two-lattice LBM. This is accomplished by a trade-off between memory reduction and computational performance. To keep a suitable computational efficiency, memory reduction has reached to about 33% in D2Q9 and 42% in D3Q19. In addition, this approach has been implemented on graphical processing unit (GPU) as well. In regard to onboard memory limitation in GPU, the advantage of this new algorithm is enhanced even more (39% in D2Q9 and 45% in D3Q19). Note, because of higher memory bandwidth in GPU, computational performance of our new algorithm using GPU is better than CPU.

show abstract

Comparison of implementations of the lattice-Boltzmann method

Cited by 54 publications

References 17 publications

Lattice Boltzmann Simulation Code Optimization Based on Constant-time Circular Array Shifting

Lattice Boltzmann Simulation Code Optimization Based on Constant-time Circular Array Shifting

Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study

A New Approach to Reduce Memory Consumption in Lattice Boltzmann Method on GPU

Contact Info

Product

Resources

About