Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis 2011
DOI: 10.1145/2063384.2063458
|View full text |Cite
|
Sign up to set email alerts
|

Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning

Abstract: We are witnessing a rapid evolution of HPC node architectures and on-chip parallelism as power and cooling constraints limit increases in microprocessor clock speeds. In this work, we demonstrate a hierarchical approach towards effectively extracting performance for a variety of emerging multicore-based supercomputing platforms. Our examined application is a structured grid-based Lattice Boltzmann computation that simulates homogeneous isotropic turbulence in magnetohydrodynamics. First, we examine sophisticat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
24
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 30 publications
(25 citation statements)
references
References 27 publications
1
24
0
Order By: Relevance
“…In previous tuning studies conducted on Blue Gene/P, flat MPI and MPI/OpenMP programming models were shown to offer similar performance results for the LBM [21]. We found similar results as depicted in Fig.…”
Section: B Hybrid Implementationsupporting
confidence: 82%
See 1 more Smart Citation
“…In previous tuning studies conducted on Blue Gene/P, flat MPI and MPI/OpenMP programming models were shown to offer similar performance results for the LBM [21]. We found similar results as depicted in Fig.…”
Section: B Hybrid Implementationsupporting
confidence: 82%
“…Each processor adds an extra row to its domain of interest, as shown in blue, and populates this ghost cell row with the border data from its neighboring processor. The use of an extra row of ghost cells is often found in large-scale models [21], [18]; however, by stopping at one row, potential for further tuning is being left unexplored. Kjolstad and Snir discussed implementation methods of the ghost cell pattern in [22] and suggested the investigation of deep halos as a potential method to trade off computation for communication.…”
Section: A Deep Halo Ghost Cellsmentioning
confidence: 99%
“…Our high-dimensional data example is based on a study of auto-tuning strategies for HPC systems [Williams et al 2011] with the goal of designing an auto-tuner for large complicated algorithm runs. This study uses a lattice Boltzmann magnetohydrodynamics algorithm and its performance evaluation on different HPC systems to explore the available parameters to tune.…”
Section: Applications and Resultsmentioning
confidence: 99%
“…Our performance model is based on the observation that the LBM is typically a memory bandwidth limited algorithm [3,[29][30][31][32]. The EsoTwist method for a 27 speed lattice requires 27 distributions that have to be read and written in every time step.…”
Section: Performance Modelmentioning
confidence: 99%