Massively parallel two-dimensional TLM algorithm on graphics processing units

Rossi, Filippo; So, P.P.M.; Fichtner, Nikolaus; Russer, P.

doi:10.1109/mwsym.2008.4633126

Cited by 16 publications

(12 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A CUDA kernel consists of special instructions for execution on the GPU's multiprocessors; details about CUDA enabled GPU architecture and its SDK are given in [6]. One of the challenges in designing kernels is to ensure processes are synchronized across all multi-processors.…”

Section: Transmission Line Matrix Algorithmsmentioning

confidence: 99%

Parallel TLM Procedures for NVIDIA GPU

2011

Electromagnetics and Network Theory and Their Microwave Technology Applications

View full text Add to dashboard Cite

Massively parallel computing technology has undergone a paradigm shift in recent years. The driving force behind this change is the need for better graphics hardware for personal computers. The latest graphics processors from ATI, Intel and NVIDIA have advanced multi-processor hardware to support popular graphics interfaces such as DirectX and OpenGL. These new graphics processing units (GPU) employ the Single Instruction Multiple Data (SIMD) computing model which enables all processors in the GPU to work simultaneously on a vast amount of data using identical instructions. This approach is very suitable for performing graphics operations because all pixels in an image require identical transformation and mapping instructions.The SIMD computing model, which revolutionized the GPU industry, is making its way into mainstream computing. Matrix operations, which are at the core of many computer graphics algorithms, are also found in many linear algebra routines. Moreover numerical procedures that require identical instructions to be executed on a large amount of data are suitable candidates for execution on the SIMD hardware in advanced GPUs. However, developing parallel algorithms for GPU hardware is not straightforward. The task is further complicated by the lack of a good software development kit (SDK) that encapsulates the hardware details in a software model.As of the writing of this article, ATI has released a Stream Computing SDK [1] whereas NVIDIA has released a new version of their Compute Unified Device Architecture (CUDA) SDK [2]. In addition to that, NVIDIA is working on an Open Computing Language (OpenCL) for programming GPU hardware [3]. In this paper, a TLM engine implemented using the CUDA SDK is presented.

show abstract

Section: Transmission Line Matrix Algorithmsmentioning

confidence: 99%

Parallel TLM Procedures for NVIDIA GPU

2011

Electromagnetics and Network Theory and Their Microwave Technology Applications

View full text Add to dashboard Cite

show abstract

“…The GPU based TLM engines reported in [12]- [13] have been updated to work with a newly developed Qt user interface framework. The updated engine is implemented using OpenCL in C++.…”

Section: Implementation Of Tlm On Gpumentioning

confidence: 99%

“…Asynchronous parallel executions are implemented using Qt's thread objects, QThread; these objects can be used in conjunction with OpenMP and MPI to create parallel execution on multiple CPU cores or workstations. The GPU based TLM engine uses the data mapping concepts reported in [12] and [13]; figure 1. Moreover, much work has been invested to realize data locality and coalescence on the GPU.…”

Section: Implementation Of Tlm On Gpumentioning

confidence: 99%

Transmission line matrix algorithms for high performance computing hardware with graphics processing units

2011

2011 International Conference on Electromagnetics in Advanced Applications

View full text Add to dashboard Cite

An open-source multi-platform TLM software package for heterogeneous computing hardware with multiple central processing units and graphics processing units has been developed. This paper describes the software package and the embedded TLM engines. The modeling framework employed by the software can be used to accelerate TLM and FDTD field simulations. The package is available for free download from the author's website.

show abstract

“…Parallelism is the future of computing [9] and the interest of the Antennas and Propagation community in topics of high performance computing and, in particular, of parallel programming on GPUs to face computationally burdened problems has been remarkable, as witnessed by [2][3][4][5][6][7] and by other electromagnetic numerical methods which have benefitted from GPU computing [10][11][12][13][14][15][16][17]. From this starting point, it is clear that the electromagnetic community can take advantage of this technological evolution to employ ever-more sophisticated numerical methods.…”

Section: Introductionmentioning

confidence: 99%

Fast, Phase-Only Synthesis of Aperiodic Reflectarrays Using Nuffts and Cuda

Capozzoli¹,

Curcio²,

Liseno³

et al. 2016

PIER

View full text Add to dashboard Cite

Abstract-We deal with one of the computationally most critical steps of the Phase-Only synthesis of aperiodic reflectarrays, namely the fast evaluation of the radiation operator. We present an approach exploiting the use of a fast numerical algorithm using 2D Non-Uniform FFTs (NUFFTs) of NED (NonEquispaced Data) and NER (Non-Equispaced Results) type and of parallel processing on Graphics Processing Units (GPUs). We extend the approach in K. Fourmont, J. Fourier Anal. Appl., Vol. 9, No. 5, 431-540, 2013 for implementing NUFFT routines to the 2D case and illustrate the parallel strategies to accelerate the approach. In particular, we show how the two levels of parallelism intrinsic in the interpolation step of the 2D NED-NUFFT can be fruitfully exploited by adopting dynamic parallelism, a feature made available in one of the latest architecture of NVIDIA cards. The presented synthesis results show that the introduction of further degrees of freedom (positions) allows improving the performance with respect to periodic reflectarrays. Also, the possibility of adopting aperiodic reflectarrays of reduced number of elements for fixed performance is demonstrated.

show abstract

Massively parallel two-dimensional TLM algorithm on graphics processing units

Cited by 16 publications

References 13 publications

Parallel TLM Procedures for NVIDIA GPU

Parallel TLM Procedures for NVIDIA GPU

Transmission line matrix algorithms for high performance computing hardware with graphics processing units

Fast, Phase-Only Synthesis of Aperiodic Reflectarrays Using Nuffts and Cuda

Contact Info

Product

Resources

About