2011
DOI: 10.2528/pier10120802
|View full text |Cite
|
Sign up to set email alerts
|

On Openmp Parallelization of the Multilevel Fast Multipole Algorithm

Abstract: Abstract-Compared with MPI, OpenMP provides us an easy way to parallelize the multilevel fast multipole algorithm (MLFMA) on shared-memory systems. However, the implementation of OpenMP parallelization has many pitfalls because different parts of MLFMA have distinct numerical characteristics due to its complicated algorithm structure. These pitfalls often cause very low efficiency, especially when many threads are employed. Through an in-depth investigation on these pitfalls with analysis and numerical experim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…However, the overhead of FE-IIEE corresponding to the brute-force accurate evaluation of the convolutional type operation of (5) (and its analogous for the normal derivative) is reflected as a shift of the FE-IIEE line towards larger computational times. Although the development of an efficient implementation of an acceleration technique (as FMM (Fast Multipole Method [44][45][46]) or ACA (Adaptive Cross Approximation [47]) using the hierarchy of the meshes (tree type data structures) is out of the scope of this paper, an estimation of the performance of FE-IIEE with an acceleration technique has been included. It corresponds to the line labeled as "FE-IIEE (fast)".…”
Section: Numerical Resultsmentioning
confidence: 99%
“…However, the overhead of FE-IIEE corresponding to the brute-force accurate evaluation of the convolutional type operation of (5) (and its analogous for the normal derivative) is reflected as a shift of the FE-IIEE line towards larger computational times. Although the development of an efficient implementation of an acceleration technique (as FMM (Fast Multipole Method [44][45][46]) or ACA (Adaptive Cross Approximation [47]) using the hierarchy of the meshes (tree type data structures) is out of the scope of this paper, an estimation of the performance of FE-IIEE with an acceleration technique has been included. It corresponds to the line labeled as "FE-IIEE (fast)".…”
Section: Numerical Resultsmentioning
confidence: 99%
“…Previous parallel algorithms (e.g., parallel FDTD [10,11], parallel direct solver [12] and parallel MLFMM [13,14]) were mainly implemented on CPU clusters. Due to the high performance/cost ratio and the fast performance growth of GPUs, GPU clusters are becoming more and more popular.…”
Section: Introductionmentioning
confidence: 99%
“…The finest ID level is the 6-th level. Z (6) NFI consumes 453 MB memory, while S (i) matrices at all ID levels require over 8.5 GB memory in total. The ID skeletonization approximation reduces the memory requirement by a factor of 6.0.…”
Section: The Two-cylinder Targetmentioning
confidence: 99%
“…The computational complexity of MoM is O(N 3 ) for a conventional direct solver in terms of CPU time, such as LU, and O(mN 2 ) for an iterative algorithm, where N is the number of unknowns and m the iteration count. Consequently, rather than direct solvers, iterative ones in combination with some fast algorithms [1][2][3][4][5][6][7][8][9] are more popular in solving MoM systems. Most of these fast algorithms decompose interactions in MoM systems into near-field interactions (NFIs) and far-field interactions (FFIs), and then approximate FFIs in some efficient way.…”
Section: Introductionmentioning
confidence: 99%