2013
DOI: 10.1007/978-3-642-38750-0_2
|View full text |Cite
|
Sign up to set email alerts
|

Up to 700k GPU Cores, Kepler, and the Exascale Future for Simulations of Star Clusters Around Black Holes

Abstract: Abstract. We present direct astrophysical N-body simulations with up to a few million bodies using our parallel MPI/CUDA code on large GPU clusters in China, Ukraine and Germany, with different kinds of GPU hardware. These clusters are directly linked under the Chinese Academy of Sciences special GPU cluster program in the cooperation of ICCS (International Center for Computational Science). We reach about the half the peak Kepler K20 GPU performance for our ϕ-GPU 1 code [2], in a real application scenario wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0
1

Year Published

2014
2014
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 32 publications
(27 reference statements)
0
11
0
1
Order By: Relevance
“…For the two BHB dy nam i cal or bit in te gra tion, we use the pub licly avail able jGPU * [7,8] with a 4 th or der Hermite in te gra tor and block hi er ar chi cal in di vid ual time step scheme. This Hermite scheme re quires us to know the ac cel er a tion and its first time-de riv a tive, called jerk.…”
Section: Some Nu Mer I Cal De Tailsmentioning
confidence: 99%
“…For the two BHB dy nam i cal or bit in te gra tion, we use the pub licly avail able jGPU * [7,8] with a 4 th or der Hermite in te gra tor and block hi er ar chi cal in di vid ual time step scheme. This Hermite scheme re quires us to know the ac cel er a tion and its first time-de riv a tive, called jerk.…”
Section: Some Nu Mer I Cal De Tailsmentioning
confidence: 99%
“…For the gravitational N -body problem these accelerators give a manifold speed increase with respect to code running on CPU (Nyland et al 2007;Portegies Zwart et al 2007;Bédorf & Portegies Zwart 2012). Parallel computers equipped more than one hundred GPUs have been utilized for various studies Berczik et al 2011Berczik et al , 2013 have been run efficiently in parallel to provide the computational power necessary to perform direct many body simulations. Access to such large GPU-equipped supercomputers, however, is not easy, in particular when the computations required a considerable fraction of the available hardware.…”
Section: The Fly-by Star Perturbation Andmentioning
confidence: 99%
“…Our test simulation with ∼ 7.37·10 5 particles achieved ∼ 800 times acceleration with just two GPUs with single precision, compared to one Xeon E5520 CPU node, used in the MPI-OpenMP benchmark. We have sustained 9.3Tflop/s on this node, and this is ∼ 24% of the theoretical peak performance and roughly 50% of the efficiency of the ϕ-GPU direct N-body code [19]. Also, it is clear from this test, that using multiple GPUs is only worth it if the particle number is large enough.…”
Section: Force Calculation and Parallelization Methodsmentioning
confidence: 86%