Manuel Ujaldón scite author profile

Membrane Computing is a discipline aiming to abstract formal computing models, called membrane systems or P systems, from the structure and functioning of the living cells as well as from the cooperation of cells in tissues, organs, and other higher order structures. This framework provides polynomial time solutions to NP-complete problems by trading space for time, and whose efficient simulation poses challenges in three different aspects: an intrinsic massively parallelism of P systems, an exponential computational workspace, and a non-intensive floating point nature.In this paper, we analyze the simulation of a family of recognizer P systems with active membranes that solves the Satisfiability problem in linear time on different instances of Graphics Processing Units (GPUs). For an efficient handling of the exponential workspace created by the P systems computation, we enable different data policies to increase memory bandwidth and exploit data locality through tiling and dynamic queues. Parallelism inherent to the target P system is also managed to demonstrate that GPUs offer a valid alternative for high-performance computing at a considerably lower cost. Furthermore, scalability is demonstrated on the way to the largest problem size we were able to run, and considering the new hardware generation from Nvidia, Fermi, for a total speed-up exceeding four orders of magnitude when running our simulations on the Tesla S2050 server.

show abstract

Vienna-Fortran/HPF extensions for sparse and irregular problems and their compilation

Ujaldón

Zapata

Chapman

et al. 1997

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Vienna Fortran, High Performance Fortran (HPF), and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction, based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the processors of a machine. In this paper, we use Vienna-Fortran as a general framework for dealing with sparse data structures. We describe new methods for the representation and distribution of such data on DMMPs, and propose simple language features that permit the user to characterize a matrix as "sparse" and specify the associated representation. Together with the data distribution for the matrix, this enables the compiler and runtime system to translate sequential sparse code into explicitly parallel message-passing code. We develop new compilation and runtime techniques, which focus on achieving storage economy and reducing communication overhead in the target program. The overall result is a powerful mechanism for dealing efficiently with sparse matrices in data parallel languages and their compilers for DMMPs.Index Terms-Data-parallel language and compiler, sparse computation, distributed-memory machines, runtime support.

show abstract

The 2D wavelet transform on emerging architectures: GPUs and multicores

Franco

Bernabé

Fernández

et al. 2011

J Real-Time Image Proc

View full text Add to dashboard Cite

We present in this paper several implementations of the 3D Fast Wavelet Transform (3D-FWT) on multicore CPUs and manycore GPUs. On the GPU side, we focus on CUDA and OpenCL programming to develop methods for an efficient mapping on manycores. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. We evaluate these proposals and make a comparison between a new Fermi Tesla C2050 and an Intel Core 2 Quad Q6700. Speedups of the CUDA version are the best results, improving the execution times on CPU, ranging from 5.3x to 7.4x for different image sizes, and up to 81 times faster when communications are neglected. Meanwhile, OpenCL obtains solid gains which range from 2x factors on small frame sizes to 3x factors on larger ones.

show abstract

Parallelization strategies for ant colony optimisation on GPUs

Cecilia

Garcia

Ujaldón

et al. 2011

View full text Add to dashboard Cite

Abstract-Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is therefore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: Tour construction and Pheromone update. The former has been previously implemented on the GPU, using a task-based parallelism approach. However, up until now, the latter has always been implemented on the CPU. In this paper, we discuss several parallelisation strategies for both stages of the ACO algorithm on the GPU. We propose an alternative data-based parallelism scheme for Tour construction, which fits better on the GPU architecture. We also describe novel GPU programming strategies for the Pheromone update stage. Our results show a total speed-up exceeding 28x for the Tour construction stage, and 20x for Pheromone update, and suggest that ACO is a potentially fruitful area for future research in the GPU domain.

show abstract

Parallelization Techniques for Sparse Matrix Applications

Ujaldón

Zapata

Sharma

et al. 1996

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors

Ruiz

Ujaldón

Cooper

et al. 2008

J Sign Process Syst Sign Image Video Technol

View full text Add to dashboard Cite

Microscopic imaging is an important tool for characterizing tissue morphology and pathology. 3D reconstruction and visualization of large sample tissue structure requires registration of large sets of high-resolution images. However, the scale of this problem presents a challenge for automatic registration methods. In this paper we present a novel method for efficient automatic registration using graphics processing units (GPUs) and parallel programming. Comparing a C++ CPU implementation with Compute Unified Device Architecture (CUDA) libraries and pthreads running on GPU we achieve a speed-up factor of up to 4.11× with a single GPU and 6.68× with a GPU pair. We present execution times for a benchmark composed of two sets of large-scale images: mouse placenta (16K × 16K pixels) and breast cancer tumors (23K × 62K pixels). It takes more than 12 hours for the genetic case in C++ to register a typical sample composed of 500 consecutive slides, which was reduced to less than 2 hours using two GPUs, in addition to a very promising scalability for extending those gains easily on a large number of GPUs in a distributed system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Manuel Ujaldón

Enhancing data parallelism for Ant Colony Optimization on GPUs

Biomedical image analysis on a cooperative cluster of GPUs and multicores

The GPU on the simulation of cellular computing models

Vienna-Fortran/HPF extensions for sparse and irregular problems and their compilation

The 2D wavelet transform on emerging architectures: GPUs and multicores

Parallelization strategies for ant colony optimisation on GPUs

Parallelization Techniques for Sparse Matrix Applications

Non-rigid Registration for Large Sets of Microscopic Images on Graphics Processors

Contact Info

Product

Resources

About