2014
DOI: 10.1007/978-3-319-09873-9_60
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Dual Tree Traversal on Multi-core and Many-core Architectures for Astrophysical N-body Simulations

Abstract: In astrophysical N-body simulations, Dehnen's algorithm, implemented in the serial falcON code and based on a dual tree traversal, is faster than serial Barnes-Hut tree-codes, but outperformed by parallel CPU and GPU tree-codes. In this paper, we present a parallel dual tree traversal, implemented in the pfalcON code, targeting multi-core CPUs and manycore architectures (Xeon Phi). We focus here on both performance and portability, while preserving Dehnen's original algorithm. We first use task parallelism, wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0
1

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(20 citation statements)
references
References 13 publications
0
19
0
1
Order By: Relevance
“…The line segments represent the original level-wise Hilbert orders. However, maximizing thread-level parallelism has been proven more successful using the Dual Tree Traversal (DTT) approach [38,25], which is known for its adaptability to multiand many-core emerging architectures. For example, to find the scattered field at target positions encoded by 0000 Hilbert order in Figure 2(b), DTT simultaneously traverses source and target trees, and recursively uncovers the cell-cell interaction list (see Figure 2(a)).…”
Section: 2mentioning
confidence: 99%
“…The line segments represent the original level-wise Hilbert orders. However, maximizing thread-level parallelism has been proven more successful using the Dual Tree Traversal (DTT) approach [38,25], which is known for its adaptability to multiand many-core emerging architectures. For example, to find the scattered field at target positions encoded by 0000 Hilbert order in Figure 2(b), DTT simultaneously traverses source and target trees, and recursively uncovers the cell-cell interaction list (see Figure 2(a)).…”
Section: 2mentioning
confidence: 99%
“…We rely on the pfalcON code 1 where the DTT is performed in parallel on multi-core CPUs thanks to task synchronizations based on atomic operations [Lange and Fortin, 2014]. Another task-based DTT parallelization has been achieved thanks to a rewriting of the DTT [Taura et al, 2012] and implemented in the exaFMM code 2 .…”
Section: Related Work and Positioningmentioning
confidence: 99%
“…17: end if In Lange and Fortin [2014], falcON has been parallelized in pfalcON on multi-core CPUs and on Intel Xeon Phi thanks to task-based parallelism (e.g. with OpenMP).…”
Section: Falcon and Pfalcon Codesmentioning
confidence: 99%
“…O uso de paralelismo aliadoà vetorização SIMD/AVX2 foi explorado em diversos trabalhos envolvendo problemas de N -corpos e caminhamento emárvores [Yokota 2012, Long Wang et al 2015, Arora et al 2009, especialmente onde havia cálculo direto de forç as gravitacionais. Assimé o caso em [Lange and Fortin 2014] que usa o método dual tree e que não faz uso de instruções SIMD intrínsecas diretamente, deixando a cargo do compilador fazer o trabalho de vetorização. Issoé possível principalmente porque nesses métodos as interações são do tipo célula-célula (C-C) o que facilita a vetorização automática.…”
Section: Trabalhos Relacionadosunclassified