Manuel F. Dolz scite author profile

We present several algorithms to compute the solution of a linear system of equations on a GPU, as well as general techniques to improve their performance, such as padding and hybrid GPU-CPU computation. We also show how iterative refinement with mixed-precision can be used to regain full accuracy in the solution of linear systems. Experimental results on a G80 using CUBLAS, the implementation of BLAS for NVIDIA R GPUs with unified architecture, are given to illustrate the performance of the different algorithms and techniques proposed.

show abstract

Evaluation and tuning of the Level 3 CUBLAS for graphics processors

Barrachina

Dolz

Igual

et al. 2008

View full text Add to dashboard Cite

show abstract

Exploiting the capabilities of modern GPUs for dense matrix computations

Barrachina

Dolz

Igual

et al. 2009

Concurrency and Computation

View full text Add to dashboard Cite

Registro de acceso restringido Este recurso no está disponible en acceso abierto por política de la editorial. No obstante, se puede acceder al texto completo desde la Universitat Jaume I o si el usuario cuenta con suscripción. Registre d'accés restringit Aquest recurs no està disponible en accés obert per política de l'editorial. No obstant això, es pot accedir al text complet des de la Universitat Jaume I o si l'usuari compta amb subscripció. Restricted access item This item isn't open access because of publisher's policy. The full--text version is only available from Jaume I University or if the user has a running suscription to the publisher's contents.

show abstract

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications

Alonso

Badía

Labarta

et al. 2012

View full text Add to dashboard Cite

A generic parallel pattern interface for stream and data processing

Astorga

Dolz

Fernández

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

SummaryCurrent parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge.Apart from that, the lack of high-level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high-level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations. An approach to relieve developers from this burden is the use of pattern-based parallel programming frameworks, such as SkePU, 2FastFlow 3 , or Intel TBB. 4 In this sense, design patterns provide a way to encapsulate (using a building blocks approach) algorithmic aspects, allowing users to implement more robust, readable, and portable solutions with such a high-level of abstraction. Basically, these patterns instantiate parallelism while hide away the complexity of concurrency mechanisms, eg, thread management, synchronizations, or data sharing. Examples of applications coming from multiple domains (eg, financial, medical, and mathematical) and improving their performance through parallel programming design patterns, can be widely found in the literature. [5][6][7] Nevertheless, although all these skeletons aim to simplify the development of parallel applications, there is not a unified standard. 8 Therefore, users require understanding different frameworks, not only to decide which fits best for their purposes, but also to properly use them. Not to mention the migration efforts of applications among frameworks, which becomes as well an arduous task.In order to mitigate this situation, this paper presents GRPPI, a generic and reusable high-level C++ parallel pattern interface that comprises both stream and data-parallel patterns. In general, the goal of

show abstract

Highly sensitive and ultrafast read mapping for RNA-seq analysis

Medina

Tárraga

Martínez

et al. 2016

DNA Res

View full text Add to dashboard Cite

As sequencing technologies progress, the amount of data produced grows exponentially, shifting the bottleneck of discovery towards the data analysis phase. In particular, currently available mapping solutions for RNA-seq leave room for improvement in terms of sensitivity and performance, hindering an efficient analysis of transcriptomes by massive sequencing. Here, we present an innovative approach that combines re-engineering, optimization and parallelization. This solution results in a significant increase of mapping sensitivity over a wide range of read lengths and substantial shorter runtimes when compared with current RNA-seq mapping methods available.

show abstract

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

et al. 2012

View full text Add to dashboard Cite

show abstract

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms

Anzt

Heuveline

Aliaga

et al. 2011

View full text Add to dashboard Cite

Abstract-Energy efficiency is a major concern in modern high-performance-computing. Still, few studies provide a deep insight into the power consumption of scientific applications. Especially for algorithms running on hybrid platforms equipped with hardware accelerators, like graphics processors, a detailed energy analysis is essential to identify the most costly parts, and to evaluate possible improvement strategies. In this paper we analyze the computational and power performance of iterative linear solvers applied to sparse systems arising in several scientific applications. We also study the gains yield by dynamic voltage/frequency scaling (DVFS), and illustrate that this technique alone cannot to reduce the energy cost to a considerable amount for iterative linear solvers. We then apply techniques that set the (multi-core processor in the) host system to a low-consuming state for the time that the GPU is executing. Our experiments conclusively reveal how the combination of these two techniques deliver a notable reduction of energy consumption without a noticeable impact on computational performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Manuel F. Dolz

Solving Dense Linear Systems on Graphics Processors

Evaluation and tuning of the Level 3 CUBLAS for graphics processors

Exploiting the capabilities of modern GPUs for dense matrix computations

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications

A generic parallel pattern interface for stream and data processing

Highly sensitive and ultrafast read mapping for RNA-seq analysis

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms

Contact Info

Product

Resources

About