2020
DOI: 10.1109/access.2020.2993103
|View full text |Cite
|
Sign up to set email alerts
|

GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics

Abstract: The emergence of General Purpose Graphics Processing Units (GPGPUs) provides new opportunities to accelerate applications involving a large number of regular computations. However, properly leveraging the computational resources of graphical processors is a very challenging task. In this paper, we use this kind of device to parallelize FE-IIEE (Finite Element-Iterative Integral Equation Evaluation), a non-standard finite element mesh truncation technique introduced by two of the authors. This application is co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 47 publications
0
2
0
Order By: Relevance
“…As we increase both parameters, the spatial cost of the algorithm also increases, and we exhaust the resources available in the GPU. As we pointed out in [14], the main factor that limits the performance of this parallel algorithm is the number of registers available in each streaming multiprocessor. The computations involved by each iteration of the loop on S involve the use of a very large number of small vectors and scalar variables local to every CUDA thread, which use up even the large number of registers available on most modern GPUs.…”
Section: Cuda Results On Gpumentioning
confidence: 99%
See 1 more Smart Citation
“…As we increase both parameters, the spatial cost of the algorithm also increases, and we exhaust the resources available in the GPU. As we pointed out in [14], the main factor that limits the performance of this parallel algorithm is the number of registers available in each streaming multiprocessor. The computations involved by each iteration of the loop on S involve the use of a very large number of small vectors and scalar variables local to every CUDA thread, which use up even the large number of registers available on most modern GPUs.…”
Section: Cuda Results On Gpumentioning
confidence: 99%
“…The algorithm cuda S, introduced in [14], is based on a kernel that implements all computations involved in each iteration of the loop on S. The algorithm tries to optimise the management of the different kinds of GPU memory by leveraging the register file and the shared memory of the GPU. Specifically, we copy the elements of vector currents to the shared memory of each block of threads in order to reduce the number of accesses to global memory.…”
Section: Cuda Parallelization On Gpumentioning
confidence: 99%