Marc Baboulin scite author profile

a b s t r a c tWe highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used to develop efficient dense linear algebra algorithms for these systems. We illustrate the main ideas with the development of a hybrid LU factorization algorithm where we split the computation over a multicore and a graphics processor, and use particular techniques to reduce the amount of pivoting and communication between the hybrid components. This results in an efficient algorithm with balanced use of a multicore processor and a graphics processor.

show abstract

Accelerating scientific computations with mixed precision algorithms

Baboulin

Buttari²,

Dongarra

et al. 2009

Computer Physics Communications

158

112

View full text Add to dashboard Cite

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.

show abstract

A Contribution to the Conditioning of the Total Least-Squares Problem

Baboulin¹,

Gratton²

2011

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Accelerating Linear System Solutions Using Randomization Techniques

Baboulin

Dongarra

Herrmann

et al. 2013

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

We show in this paper how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a very aordable computational price while providing us with a satisfying accuracy when compared to partial pivoting. This random transformation called Partial Random Buttery Transformation (PRBT) is optimized in terms of data storage and ops count. We propose a solver where PRBT and the LU factorization with no pivoting take advantage of the latest generation of hybrid multicore/GPU machines and we compare its Gop/s performance with a solver implemented in a current parallel library.

show abstract

Collective Mind: Towards Practical and Collaborative Auto-Tuning

Fursin

Miceli

Lokhmotov

et al. 2014

Scientific Programming

View full text Add to dashboard Cite

Abstract. Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changing and complex software and hardware stack, large and multi-dimensional optimization spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material.We present a possible collaborative approach to solve above problems using Collective Mind knowledge management system. In contrast with previous cTuning framework, this modular infrastructure allows to preserve and share through the Internet the whole auto-tuning setups with all related artifacts and their software and hardware dependencies besides just performance data. It also allows to gradually structure, systematize and describe all available research material including tools, benchmarks, data sets, search strategies and machine learning models. Researchers can take advantage of shared components and data with extensible meta-description to quickly and collaboratively validate and improve existing auto-tuning and benchmarking techniques or prototype new ones. The community can now gradually learn and improve complex behavior of all existing computer systems while exposing behavior anomalies or model mispredictions to an interdisciplinary community in a reproducible way for further analysis. We present several practical, collaborative and model-driven auto-tuning scenarios. We also decided to release all material at c-mind.org/repo to set up an example for a collaborative and reproducible research as well as our new publication model in computer engineering where experimental results are continuously shared and validated by the community.

show abstract

A Partial Condition Number for Linear Least Squares Problems

Arioli¹,

Baboulin²,

Gratton³

2007

SIAM J. Matrix Anal. & Appl.

View full text Add to dashboard Cite

Abstract. We consider here the linear least squares problem min y∈R n Ay − b 2 where b ∈ R m and A ∈ R m×n is a matrix of full column rank n and we denote x its solution. We assume that both A and b can be perturbed and that these perturbations are measured using the Frobenius or the spectral norm for A and the Euclidean norm for b. In this paper, we are concerned with the condition number of a linear function of x (L T x where L ∈ R n×k ) for which we provide a sharp estimate that lies within a factor √ 3 of the true condition number. Provided the triangular R factor of A from A T A = R T R is available, this estimate can be computed in 2kn 2 flops. We also propose a statistical method that estimates the partial condition number by using the exact condition numbers in random orthogonal directions. If R is available, this statistical approach enables to obtain a condition estimate at a lower computational cost. In the case of the Frobenius norm, we derive a closed formula for the partial condition number that is based on the singular values and the right singular vectors of the matrix A. Keywords: Linear least squares, normwise condition number, statistical condition estimate, parameter estimation 1. Introduction. Perturbation theory has been applied to many problems of linear algebra such as linear systems, linear least squares, or eigenvalue problems [1,4,11,18]. In this paper we consider the problem of calculating the quantity L T x, where x is the solution of the linear least squares problem (LLSP) min x∈R n Ax − b 2 where b ∈ R m and A ∈ R m×n is a matrix of full column rank n. This estimation is a fundamental problem of parameter estimation in the framework of the p. 137]. More precisely, we focus here on the evaluation of the sensitivity of L T x to small perturbations of the matrix A and/or the right-hand side b, where L ∈ R n×k and x is the solution of the LLSP. The interest for this question stems for instance from parameter estimation where the parameters of the model can often be divided into two parts : the variables of physical significance and a set of ancillary variables involved in the models. For example, this situation occurs in the determination of positions using the GPS system, where the 3-D coordinates are the quantities of interest but the statistical model involves other parameters such as clock drift and GPS ambiguities [12] that are generally estimated during the solution process. It is then crucial to ensure that the solution components of interest can be computed with satisfactory accuracy. The main goal of this paper is to formalize this problem in terms of a condition number and to describe practical methods to compute or estimate this quantity. Note that as far as the sensitivity of a subset of the solution components is concerned, the matrix L is a projection whose columns consist of vectors of the canonical basis of R n .

show abstract

High-performance Tensor Contractions for GPUs

Abdelfattah

Baboulin

Dobrev

et al. 2016

Procedia Computer Science

View full text Add to dashboard Cite

Using dual techniques to derive componentwise and mixed condition numbers for a linear function of a linear least squares solution

Baboulin¹,

Gratton²

2009

Bit Numer Math

View full text Add to dashboard Cite

We prove duality results for adjoint operators and product norms in the framework of Euclidean spaces. We show how these results can be used to derive condition numbers especially when perturbations on data are measured componentwise relatively to the original data. We apply this technique to obtain formulas for componentwise and mixed condition numbers for a linear function of a linear least squares solution. These expressions are closed when perturbations of the solution are measured using a componentwise norm or the infinity norm and we get an upper bound for the Euclidean norm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marc Baboulin

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Accelerating scientific computations with mixed precision algorithms

A Contribution to the Conditioning of the Total Least-Squares Problem

Accelerating Linear System Solutions Using Randomization Techniques

Collective Mind: Towards Practical and Collaborative Auto-Tuning

A Partial Condition Number for Linear Least Squares Problems

High-performance Tensor Contractions for GPUs

Using dual techniques to derive componentwise and mixed condition numbers for a linear function of a linear least squares solution

Contact Info

Product

Resources

About