2008 DoD HPCMP Users Group Conference 2008
DOI: 10.1109/dod.hpcmp.ugc.2008.12
|View full text |Cite
|
Sign up to set email alerts
|

Exploring New Architectures in Accelerating CFD for Air Force Applications

Abstract: Abstract-1 Computational Fluid Dynamics (CFD) is an active field of research where the development of faster and more accurate methods is linked to the continuous demand for ever higher computational power. And indeed, for at least two decades, high-performance computing (HPC) programmers have taken for granted that each successive generation of microprocessors would, either immediately or after minor adjustments, make their software run substantially faster. But recent microprocessor design trends including t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0
1

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 23 publications
(15 citation statements)
references
References 14 publications
0
14
0
1
Order By: Relevance
“…In May, Volkov and Demmel [35] described LU, QR, and Cholesky factorizations running at up to 180 GFlop/s in single precision (with QR a little bit more). The first results on a pre-released next-generation G90 NVIDIA card were presented at UGC2008 in May, where Dongarra et al [11] reported Cholesky running at up to 327 GFlop/s in single precision. Using again the newest generation card, in this paper, we describe an LU algorithm running at up to 388 GFlop/s in single precision and 99.4 Gflop/s in double precision.…”
Section: Gpus For Dlamentioning
confidence: 99%
See 2 more Smart Citations
“…In May, Volkov and Demmel [35] described LU, QR, and Cholesky factorizations running at up to 180 GFlop/s in single precision (with QR a little bit more). The first results on a pre-released next-generation G90 NVIDIA card were presented at UGC2008 in May, where Dongarra et al [11] reported Cholesky running at up to 327 GFlop/s in single precision. Using again the newest generation card, in this paper, we describe an LU algorithm running at up to 388 GFlop/s in single precision and 99.4 Gflop/s in double precision.…”
Section: Gpus For Dlamentioning
confidence: 99%
“…This is illustrated for Cholesky factorization (so called left-looking version) in Fig. 2 (the case reported in [11]). The matrix to be factorized is allocated on the GPU memory and the code is as in LAPACK with BLAS calls replaced by CUBLAS, which represents the first idea from the list above.…”
Section: Gpus For Dlamentioning
confidence: 99%
See 1 more Smart Citation
“…Their use in general-purpose computations [24], and more specifically in CFD [5], is promising. Successful attempts were made to implement LBM solvers on the GPU [6].…”
Section: Introductionmentioning
confidence: 99%
“…Nevertheless, this approach can lead to high performance, but only after some modifications and for routines that map well on the GPU, like Cholesky (e.g. Dongarra et al [8] report up to 327 GFlop/s in single precision on a pre-released at the time NVIDIA T10P). Naturally, previous attempts to wrap some of the work needed in transitions like this in frameworks, have also failed to produce convincing results.…”
Section: Introductionmentioning
confidence: 99%