2001
DOI: 10.1016/s0167-8191(00)00075-2
|View full text |Cite
|
Sign up to set email alerts
|

High-performance parallel implicit CFD

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
94
0

Year Published

2001
2001
2010
2010

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 139 publications
(95 citation statements)
references
References 19 publications
1
94
0
Order By: Relevance
“…The continuously improving floating-point performance of the last few generations of microprocessors, and the availability of continuously cheaper high speed interconnection networks has meant that PC clusters (distributed memory) are increasingly being adopted as a cost effective alternative to classical parallel supercomputers (shared memory) for running large scale numerical simulations 19 .…”
Section: Introductionmentioning
confidence: 99%
“…The continuously improving floating-point performance of the last few generations of microprocessors, and the availability of continuously cheaper high speed interconnection networks has meant that PC clusters (distributed memory) are increasingly being adopted as a cost effective alternative to classical parallel supercomputers (shared memory) for running large scale numerical simulations 19 .…”
Section: Introductionmentioning
confidence: 99%
“…More recent work [4,8] also focuses on techniques for multigrid on unstructured meshes. Keyes et al have applied data layout optimization and data access transformation techniques to other iterative methods [6]. Genius et al have proposed an automatable method to guide array merging for stencil-based codes based on a meeting graph method [5].…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we evaluate the hybrid programming model using memory performance as a metric in the context of an unstructured implicit CFD code, PETSc-FUN3D [2]. The performance of many scientific computing codes is dependent on the performance of the memory subsystem, including the available memory bandwidth, memory latency, number and sizes of caches, etc.…”
mentioning
confidence: 99%
“…Each of these groups of tasks stresses a different subsystem of contemporary high-performance computers. After tuning, linear algebraic recurrences run at close to the aggregate memory-bandwidth limit on performance, flux computation loops over edges are bounded either by memory bandwidth or instruction scheduling, and parallel efficiency is bounded primarily by slight load imbalances at synchronization points [2].…”
mentioning
confidence: 99%
See 1 more Smart Citation