2018
DOI: 10.1007/978-3-319-78024-5_10
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Assembly of ACA BEM Matrices on Xeon Phi Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…This GPU would also achieve a much higher performance in the H-matrix setup, since our model BEM code would run much faster than on the rather old Tesla K20X cards. In the future, we also aim at combining a domain-decomposition parallelization with the task-based parallelization as for example in [3,31] to solve even much larger problem sizes.…”
Section: Performance and Scalabilitymentioning
confidence: 99%
See 2 more Smart Citations
“…This GPU would also achieve a much higher performance in the H-matrix setup, since our model BEM code would run much faster than on the rather old Tesla K20X cards. In the future, we also aim at combining a domain-decomposition parallelization with the task-based parallelization as for example in [3,31] to solve even much larger problem sizes.…”
Section: Performance and Scalabilitymentioning
confidence: 99%
“…In the future, we aim at improving the multi-GPU load balancing by techniques proposed e.g. in [3,31]. However, while these techniques work well in the context of non-batched operations, we assume that their combination with batching will still be sub-optimal on GPUs.…”
Section: Performance and Scalabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…In Sect. 3 we propose a strategy to parallelize the assembly of the MTF matrix blocks and their application in an iterative solver based on the approach presented in [11][12][13] for single domain problems. Except for the distributed parallelism, the method takes full advantage of the BEM4I library [14,20,21] and its assemblers parallelized in shared memory and vectorized by OpenMP.…”
Section: Introductionmentioning
confidence: 99%