Stefan Harfst scite author profile

Direct-summation N -body algorithms compute the gravitational interaction between stars in an exact way and have a computational complexity of O(N 2 ). Performance can be greatly enhanced via the use of special-purpose accelerator boards like the GRAPE-6A. However the memory of the GRAPE boards is limited. Here, we present a performance analysis of direct N -body codes on two parallel supercomputers that incorporate special-purpose boards, allowing as many as four million particles to be integrated. Both computers employ high-speed, Infiniband interconnects to minimize communication overhead, which can otherwise become significant due to the small number of "active" particles at each time step. We find that the computation time scales well with processor number; for 2×10 6 particles, efficiencies greater than 50% and speeds in excess of 2 TFlops are reached.

show abstract

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Gaburov

2009

View full text Add to dashboard Cite

We present Sapporo, a library for performing high-precision gravitational N -body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N -body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small ( < ∼ 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10 6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time. IntroductionGraphical processing units (GPUs) are quickly becoming main stream in computational science. The introduction of Compute Unified Device Architecture (CUDA, Fernando, 2004), in which GPUs can be programmed effectively, has generated a paradigm shift in scientific computing (Hoekstra et al., 2007). Modern GPUs are greener in terms of CO 2 production, have a smaller footprint, are cheaper, and as easy to program as traditional parallel computers. In addition, you will not have a waiting queue when running large simulations on your local GPU-equipped workstation.Newtonian stellar dynamics is traditionally on the forefront of high-performance computing. The first dedicated Newtonian solver (Applegate et al., 1986) was used to study the stability of the solar system (Sussman and Wisdom, 1992). And soon even faster specialized hardware was introduced by the inauguration of the GRAPE family of computers, which have an impressive history of breaking computing speed records (Makino and Taiji, 1998).Nowadays, the GPUs are being used in various scientific areas, such as molecular dynamics (Anderson et al., 2008;van Meel et al., 2008), solving Kepler's equations (Ford, 2009) and Newtonian N -body simulations. Solving the Newtonian N -body problem with GPUs started in the early 2000s by adopting a shared time step algorithm with a 2nd order integrator (Nyland et al., 2004). A few years later this algorithm was improved to include individual time steps and a higher order integrator , in a code that was written in the device specific language Cg (Fernando and Kilgard, 2003). The performance was still relatively low compared to later implementations in CUDA via the Cunbody package (Hamada and Iitaka, 2007), Kirin library , and the Yebisu N -body code (Nitadori and Makino, 2008;Nitadori, 2009). The main p...

show abstract

The Cosmogrid Simulation: Statistical Properties of Small Dark Matter Halos

Ishiyama

Rieder

Makino

et al. 2013

ApJ

100

104

View full text Add to dashboard Cite

We present the results of the "Cosmogrid" cosmological N-body simulation suites based on the concordance LCDM model. The Cosmogrid simulation was performed in a 30 Mpc box with 2048 3 particles. The mass of each particle is 1.28 × 10 5 M , which is sufficient to resolve ultra-faint dwarfs. We found that the halo mass function shows good agreement with the Sheth & Tormen fitting function down to ∼10 7 M . We have analyzed the spherically averaged density profiles of the three most massive halos which are of galaxy group size and contain at least 170 million particles. The slopes of these density profiles become shallower than −1 at the innermost radius. We also find a clear correlation of halo concentration with mass. The mass dependence of the concentration parameter cannot be expressed by a single power law, however a simple model based on the Press-Schechter theory proposed by Navarro et al. gives reasonable agreement with this dependence. The spin parameter does not show a correlation with the halo mass. The probability distribution functions for both concentration and spin are well fitted by the log-normal distribution for halos with the masses larger than ∼10 8 M . The subhalo abundance depends on the halo mass. Galaxy-sized halos have 50% more subhalos than ∼10 11 M halos have.

show abstract

Reconstructing the Arches cluster - I. Constraining the initial conditions

Harfst¹,

Zwart²,

Stolte³

2010

View full text Add to dashboard Cite

We have performed a series of N‐body simulations to model the Arches cluster. Our aim is to find the best‐fitting model for the Arches cluster by comparing our simulations with observational data and to constrain the parameters for the initial conditions of the cluster. By neglecting the Galactic potential and stellar evolution, we are able to efficiently search through a large parameter space to determine, for example, the initial mass function (IMF), size and mass of the cluster. We find that the cluster's observed present‐day mass function can be well explained with an initial Salpeter IMF. The lower mass limit of the IMF cannot be constrained well from our models. In our best models, the initial total mass down to a mass limit of 0.5 M⊙ is (4.9 ± 0.8) × 104 M⊙. The initial virial radius of the cluster is 0.77 ± 0.12 pc. A concentration parameter of the initial King model W0= 3 gives the best results.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stefan Harfst

Performance analysis of direct N-body algorithms on special-purpose supercomputers

SAPPORO: A way to turn your graphics cards into a GRAPE-6

The Cosmogrid Simulation: Statistical Properties of Small Dark Matter Halos

Reconstructing the Arches cluster - I. Constraining the initial conditions

Contact Info

Product

Resources

About