Implementation of algorithms with a fine-grained parallelism on GPUs

2012 13th International Workshop on Cellular Nanoscale Networks and Their Applications

Zamfir

et al. 2012

The aim of this paper is to discuss and compare several architectural possibilities for implementing a simulator for (ultra) sound propagation in a controlled environment (e.g. using specified obstacles and signal sources). Although initially such sound propagation simulators were designed to assist the design of robotic "ears" of autonomous agents trying to reconstruct an image of the environment, its use expands beyond its initial goals. We are particularly interested here to define the limits and the constraints for kilo-processor architectures capable to implement such systems at reasonable costs. Our results for various implementations (software, FPGA, GPU/with CUDA) are considered with some proposals for suitable kiloprocessor architectures.

Section: B Cuda/gpu Accelerationmentioning

confidence: 99%

Sound propagation cellular processors architectures, comparisons and performances

2012 13th International Workshop on Cellular Nanoscale Networks and Their Applications

Zamfir

et al. 2012

“…Various solutions for implementing cellular automata on such platforms are recently presented in the literature (e.g. [14][16]), making it easily to adjust them to the model (5) which has many similarities to any continuous state cellular automaton. Figure 4 presents the dynamics of both "u" and "v" layers for T=200 and for some face image cropped from one of the faces in the database [17].…”

Section: B the Discrete-time Rd-cnn As Image Processormentioning

confidence: 99%

“…Several analogue chips were already proposed in the literature [10] [11][12] [13], although we consider that from a practical point of view, digital models would be better. Speeding-up such models using GPU/CUDA approaches [14] are also convenient solutions in terms of good ratios between performances and costs. In this paper we propose the development of a RD-CNN processor suitable for digital implementations (PC, GPU cards, FPGAs, etc.)…”

Section: Introductionmentioning

confidence: 99%

Applications of Emergent Computation in Reaction-Diffusion CNNs for Image Processing

2013 19th International Conference on Control Systems and Computer Science

2013

The possibility to exploit emergent computation in a naturally inspired complex network, namely the reactiondiffusion cellular nonlinear network (RD-CNN), is investigated. The particular application under focus is image processing. It is shown that by implementing a simplified discrete-time model and by using the local activity theory to locate potentially useful regions in the huge parameter space, many useful image processing tasks may be performed in reasonable execution time. Such tasks may include but are not limited to: feature extraction, image enhancement, noise removal, pattern formation, etc. A framework is provided for a systematic design allowing the identification of useful genes (sets of parameters) associated with meaningful image processing tasks.

“…All cores have access to the global memory (off-chip, slow) used to exchange data between threads and between GPU and the host CPU. While the implementation presented here uses global memory, further sophisticated optimization exploiting the speed of accessing shared memory can be done resulting in even better performance [4].…”

Section: B Hardware and Optimizationmentioning

confidence: 99%

A Low Cost High Performance Computing Platform for Cellular Nonlinear Networks Using Python for CUDA

2015 20th International Conference on Control Systems and Computer Science

2015

A novel platform (hardware and software) for complex systems modeling is proposed. It exploits the newest developments in both software (Continuum's -Anaconda's Numba and Numbapro Python packages) and hardware (the use of parallel computation on GPU provided by the CUDA computing platform) to ensure high-performance, highproductivity and high-portability in developing and simulating models of cellular nonlinear networks (CNN). A particular example is given in this paper for the case of Reaction Diffusion CNN and its effectiveness it is analyzed. It is shown that the hardware resources are effectively exploited while using a programming style closer to scientific computing and with a short learning cycle. With our low cost implementation we were able to achieve very good performance in implementing a Reaction-Diffusion CNN (about 500 Mcells/second). The platform can be easily extended to support a broader spectrum of computational models similar to CNNs, such as the Finite Difference Time Domain models for various physical processes.