An OpenMP-CUDA Implementation of Multilevel Fast Multipole Algorithm for Electromagnetic Simulation on Multi-GPU Computing Systems

Guan, Jian; Yan, Su; Jin, Jian‐Ming

doi:10.1109/tap.2013.2258882

Cited by 66 publications

(34 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The implementation of aggregation and disaggregation at finest level on GPU was proposed by means of allocating a thread to each spectrum point [16]. To increase the utilization of GPU on Kepler architecture (GK110) further, which has a maximum value of 32 threads in one warp, a smart scheme is designed with two steps.…”

Section: Principle Of Mlfmm and Its Optimization On Gpumentioning

confidence: 99%

“…Meanwhile, all the 32 threads in a warp can read the data from constant memory according to a certain spectrum point through the read-only cache by using the instruction "__ldg()" or "__restrict__". Unlike the strategy of thread-based task assignment proposed for aggregation and disaggregation at coarser level [16], we make a further step in data storage by using the four times larger texture memory in Kepler than Fermi. Since the local inter-/anterpolation accesses neighboring data frequently, it had better store data in a certain pattern form looking like the geometric topology in texture memory.…”

Section: Principle Of Mlfmm and Its Optimization On Gpumentioning

confidence: 99%

“…The programming combination between GPU and CPU can fully depend on the easy-to-use language CUDA-C/PTX. To the authors' knowledge, the GPU/CPU heterogeneous platform is the most popular choice especially in CEM, such as the GPU-based FDTD [15], MLFMM [16][17][18][19], AIM [20], P-FFT [21], MoM [22,23], and higher-order MoM [24]. In 2013, an impressive implementation of MLFMM by OpenMP-CUDA was realized on Fermi architecture (NVIDIA Tesla C2050), which achieved much higher performance than those before [16].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

High Performance Computing of Complex Electromagnetic Algorithms Based on GPU/CPU Heterogeneous Platform and Its Applications to EM Scattering and Multilayered Medium Structure

Song

Mu²,

Zhou

2017

International Journal of Antennas and Propagation

View full text Add to dashboard Cite

The fast and accurate numerical analysis for large-scale objects and complex structures is essential to electromagnetic simulation and design. Comparing to the exploration in EM algorithms from mathematical point of view, the computer programming realization is coordinately significant while keeping up with the development of hardware architectures. Unlike the previous parallel algorithms or those implemented by means of parallel programming on multicore CPU with OpenMP or on a cluster of computers with MPI, the new type of large-scale parallel processor based on graphics processing unit (GPU) has shown impressive ability in various scenarios of supercomputing, while its application in computational electromagnetics is especially expected. This paper introduces our recent work on high performance computing based on GPU/CPU heterogeneous platform and its application to EM scattering problems and planar multilayered medium structure, including a novel realization of OpenMP-CUDA-MLFMM, a developed ACA method and a deeply optimized CG-FFT method. With fruitful numerical examples and their obvious enhancement in efficiencies, it is convincing to keep on deeply investigating and understanding the computer hardware and their operating mechanism in the future.

show abstract

Section: Principle Of Mlfmm and Its Optimization On Gpumentioning

confidence: 99%

Section: Principle Of Mlfmm and Its Optimization On Gpumentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

High Performance Computing of Complex Electromagnetic Algorithms Based on GPU/CPU Heterogeneous Platform and Its Applications to EM Scattering and Multilayered Medium Structure

Song

Mu²,

Zhou

2017

International Journal of Antennas and Propagation

View full text Add to dashboard Cite

show abstract

“…Indeed, many accurate and fast numerical methods have been developed in the last decades for scattering calculations, fast antenna analysis and RCS predictions, facing the important trade-off between the accuracy of the results and the rapidity of the simulations [2][3][4][5][6][7].…”

Section: Introductionmentioning

confidence: 99%

“…Parallelism is the future of computing [9] and the interest of the Antennas and Propagation community in topics of high performance computing and, in particular, of parallel programming on GPUs to face computationally burdened problems has been remarkable, as witnessed by [2][3][4][5][6][7] and by other electromagnetic numerical methods which have benefitted from GPU computing [10][11][12][13][14][15][16][17]. From this starting point, it is clear that the electromagnetic community can take advantage of this technological evolution to employ ever-more sophisticated numerical methods.…”

Section: Introductionmentioning

confidence: 99%

Fast, Phase-Only Synthesis of Aperiodic Reflectarrays Using Nuffts and Cuda

Capozzoli¹,

Curcio²,

Liseno³

et al. 2016

PIER

View full text Add to dashboard Cite

Abstract-We deal with one of the computationally most critical steps of the Phase-Only synthesis of aperiodic reflectarrays, namely the fast evaluation of the radiation operator. We present an approach exploiting the use of a fast numerical algorithm using 2D Non-Uniform FFTs (NUFFTs) of NED (NonEquispaced Data) and NER (Non-Equispaced Results) type and of parallel processing on Graphics Processing Units (GPUs). We extend the approach in K. Fourmont, J. Fourier Anal. Appl., Vol. 9, No. 5, 431-540, 2013 for implementing NUFFT routines to the 2D case and illustrate the parallel strategies to accelerate the approach. In particular, we show how the two levels of parallelism intrinsic in the interpolation step of the 2D NED-NUFFT can be fruitfully exploited by adopting dynamic parallelism, a feature made available in one of the latest architecture of NVIDIA cards. The presented synthesis results show that the introduction of further degrees of freedom (positions) allows improving the performance with respect to periodic reflectarrays. Also, the possibility of adopting aperiodic reflectarrays of reduced number of elements for fixed performance is demonstrated.

show abstract

Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules

Kale

Curtis

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

An OpenMP-CUDA Implementation of Multilevel Fast Multipole Algorithm for Electromagnetic Simulation on Multi-GPU Computing Systems

Cited by 66 publications

References 19 publications

High Performance Computing of Complex Electromagnetic Algorithms Based on GPU/CPU Heterogeneous Platform and Its Applications to EM Scattering and Multilayered Medium Structure

High Performance Computing of Complex Electromagnetic Algorithms Based on GPU/CPU Heterogeneous Platform and Its Applications to EM Scattering and Multilayered Medium Structure

Fast, Phase-Only Synthesis of Aperiodic Reflectarrays Using Nuffts and Cuda

Toward Supporting Multi-GPU Targets via Taskloop and User-Defined Schedules

Contact Info

Product

Resources

About