High-performance parallel implicit CFD

Gropp, William; Kaushik, D.; Keyes, David E.; Barry, Smith

doi:10.1016/s0167-8191(00)00075-2

Cited by 139 publications

(95 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The continuously improving floating-point performance of the last few generations of microprocessors, and the availability of continuously cheaper high speed interconnection networks has meant that PC clusters (distributed memory) are increasingly being adopted as a cost effective alternative to classical parallel supercomputers (shared memory) for running large scale numerical simulations 19 .…”

Section: Introductionmentioning

confidence: 99%

A high scalability parallel algebraic multigrid solver

Saad¹,

Darwish²

2009

Computational Fluid Dynamics 2006

View full text Add to dashboard Cite

Abstract. This paper deals with the implementation and performance analysis of a parallel Algebraic Multigrid Solver (pAMG) for a finite volume unstructured CFD code. The parallelization of the solver is based on the domain decomposition approach using the single program multiple data paradigm. The Message passing interface Library (MPI) is used for communication of data. An ILU(0) iterative solver is used for smoothing the errors arising within each partition at the different grid levels, and a multi-level synchronization across the computational domain partitions is enforced in order to improve the performance of the parallelized Multigrid solver. Two synchronization strategies are evaluated: in the first the synchronization is applied across the multigrid levels during the restriction step in addition to the base level, while in the second the synchronization is enforced during the restriction and prolongation steps. To increase robustness gathering of coefficients across partitions for the coarsest level is investigated. Tests on a number of grids from 100,00 to 800,000 elements for diffusion and advection problems have been conducted on up to 20 processors.

show abstract

Section: Introductionmentioning

confidence: 99%

A high scalability parallel algebraic multigrid solver

Saad¹,

Darwish²

2009

Computational Fluid Dynamics 2006

View full text Add to dashboard Cite

show abstract

“…More recent work [4,8] also focuses on techniques for multigrid on unstructured meshes. Keyes et al have applied data layout optimization and data access transformation techniques to other iterative methods [6]. Genius et al have proposed an automatable method to guide array merging for stencil-based codes based on a meeting graph method [5].…”

Section: Introductionmentioning

confidence: 99%

Data Layout Optimizations for Variable Coefficient Multigrid

Kowarschik

Rüde

Weiß

2002

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Efficient program execution can only be achieved if the codes respect the hierarchical memory design of the underlying architectures; programs must exploit caches to avoid high latencies involved with main memory accesses. However, iterative methods like multigrid are characterized by successive sweeps over data sets, which are commonly too large to fit in cache. This paper is based on our previous work on data access transformations for multigrid methods for constant coefficient problems. However, the case of variable coefficients, which we consider here, requires more complex data structures. We focus on data layout techniques to enhance the cache efficiency of multigrid codes for variable coefficient problems on regular meshes. We provide performance results which illustrate the effectiveness of our layout optimizations in conjunction with data access transformations.

show abstract

“…In this paper, we evaluate the hybrid programming model using memory performance as a metric in the context of an unstructured implicit CFD code, PETSc-FUN3D [2]. The performance of many scientific computing codes is dependent on the performance of the memory subsystem, including the available memory bandwidth, memory latency, number and sizes of caches, etc.…”

mentioning

confidence: 99%

“…Each of these groups of tasks stresses a different subsystem of contemporary high-performance computers. After tuning, linear algebraic recurrences run at close to the aggregate memory-bandwidth limit on performance, flux computation loops over edges are bounded either by memory bandwidth or instruction scheduling, and parallel efficiency is bounded primarily by slight load imbalances at synchronization points [2].…”

mentioning

confidence: 99%

“…However, the performance advantage primarily stems from algorithmic reasons (see Table 2). It is well known in the domain decomposition literature that the convergence rate of single level additive Schwarz method (parallel preconditioner in PETSc-FUN3D code [2]) degrades with the number of subdomains. Therefore, the preconditioner is stronger in the hybrid case since it uses fewer subdomains as compared to pure MPI case.…”

mentioning

confidence: 99%

See 1 more Smart Citation