2019
DOI: 10.1177/1094342019840806
|View full text |Cite
|
Sign up to set email alerts
|

Dual tree traversal on integrated GPUs for astrophysical N-body simulations

Abstract: In astrophysical N-body simulations, O( N) fast multipole methods (FMMs) with dual tree traversal (DTT) on multi-core CPUs are faster than O( N log N) CPU tree-codes but can still be outperformed by GPU ones. In this article, we aim at combining the best algorithm, namely FMM with DTT, with the most powerful hardware currently available, namely GPUs. In the astrophysical context requiring low accuracies and non-uniform particle distributions, we show that such combination can be achieved thanks to a hybrid CPU… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…For this purpose state-of-the-art hardware infrastructure (like GPUs) is required to carry out parallel processing. Another approach is to exploit distributed or cloud computing in order to achieve parallelism [19,20].…”
Section: Related Workmentioning
confidence: 99%
“…For this purpose state-of-the-art hardware infrastructure (like GPUs) is required to carry out parallel processing. Another approach is to exploit distributed or cloud computing in order to achieve parallelism [19,20].…”
Section: Related Workmentioning
confidence: 99%
“…However, these codes still scale like O(N 2 ) and there is great interest in implementing the sub-quadratic scaling tree-based methods on GPUs, although this is challenging due to the complexity of these methods in comparison with direct summation. In this direction there have been several GPU implementations of the TC and FMM [39,40,41,42,36,43,44,45,46,47], but we know of only a few GPU implementations of the DTT [48,49].…”
Section: Gpu Implementationsmentioning
confidence: 99%
“…We use OpenMP to speed up the dual-tree walk using task model with atomic clause used to update of multipole moments (Fortin & Touche 2019). The near field contribution is handled by a direct summation kernel, which is vectorized using AVX intrinsics as in Zhu (2020).…”
Section: )mentioning
confidence: 99%