a b s t r a c tWe highlight the trends leading to the increased appeal of using hybrid multicore + GPU systems for high performance computing. We present a set of techniques that can be used to develop efficient dense linear algebra algorithms for these systems. We illustrate the main ideas with the development of a hybrid LU factorization algorithm where we split the computation over a multicore and a graphics processor, and use particular techniques to reduce the amount of pivoting and communication between the hybrid components. This results in an efficient algorithm with balanced use of a multicore processor and a graphics processor.
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
We show in this paper how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a very aordable computational price while providing us with a satisfying accuracy when compared to partial pivoting. This random transformation called Partial Random Buttery Transformation (PRBT) is optimized in terms of data storage and ops count. We propose a solver where PRBT and the LU factorization with no pivoting take advantage of the latest generation of hybrid multicore/GPU machines and we compare its Gop/s performance with a solver implemented in a current parallel library.
Abstract. Empirical auto-tuning and machine learning techniques have been showing high potential to improve execution time, power consumption, code size, reliability and other important metrics of various applications for more than two decades. However, they are still far from widespread production use due to lack of native support for auto-tuning in an ever changing and complex software and hardware stack, large and multi-dimensional optimization spaces, excessively long exploration times, and lack of unified mechanisms for preserving and sharing of optimization knowledge and research material.We present a possible collaborative approach to solve above problems using Collective Mind knowledge management system. In contrast with previous cTuning framework, this modular infrastructure allows to preserve and share through the Internet the whole auto-tuning setups with all related artifacts and their software and hardware dependencies besides just performance data. It also allows to gradually structure, systematize and describe all available research material including tools, benchmarks, data sets, search strategies and machine learning models. Researchers can take advantage of shared components and data with extensible meta-description to quickly and collaboratively validate and improve existing auto-tuning and benchmarking techniques or prototype new ones. The community can now gradually learn and improve complex behavior of all existing computer systems while exposing behavior anomalies or model mispredictions to an interdisciplinary community in a reproducible way for further analysis. We present several practical, collaborative and model-driven auto-tuning scenarios. We also decided to release all material at c-mind.org/repo to set up an example for a collaborative and reproducible research as well as our new publication model in computer engineering where experimental results are continuously shared and validated by the community.
Abstract. We consider here the linear least squares problem min y∈R n Ay − b 2 where b ∈ R m and A ∈ R m×n is a matrix of full column rank n and we denote x its solution. We assume that both A and b can be perturbed and that these perturbations are measured using the Frobenius or the spectral norm for A and the Euclidean norm for b. In this paper, we are concerned with the condition number of a linear function of x (L T x where L ∈ R n×k ) for which we provide a sharp estimate that lies within a factor √ 3 of the true condition number. Provided the triangular R factor of A from A T A = R T R is available, this estimate can be computed in 2kn 2 flops. We also propose a statistical method that estimates the partial condition number by using the exact condition numbers in random orthogonal directions. If R is available, this statistical approach enables to obtain a condition estimate at a lower computational cost. In the case of the Frobenius norm, we derive a closed formula for the partial condition number that is based on the singular values and the right singular vectors of the matrix A. Keywords: Linear least squares, normwise condition number, statistical condition estimate, parameter estimation 1. Introduction. Perturbation theory has been applied to many problems of linear algebra such as linear systems, linear least squares, or eigenvalue problems [1,4,11,18]. In this paper we consider the problem of calculating the quantity L T x, where x is the solution of the linear least squares problem (LLSP) min x∈R n Ax − b 2 where b ∈ R m and A ∈ R m×n is a matrix of full column rank n. This estimation is a fundamental problem of parameter estimation in the framework of the p. 137]. More precisely, we focus here on the evaluation of the sensitivity of L T x to small perturbations of the matrix A and/or the right-hand side b, where L ∈ R n×k and x is the solution of the LLSP. The interest for this question stems for instance from parameter estimation where the parameters of the model can often be divided into two parts : the variables of physical significance and a set of ancillary variables involved in the models. For example, this situation occurs in the determination of positions using the GPS system, where the 3-D coordinates are the quantities of interest but the statistical model involves other parameters such as clock drift and GPS ambiguities [12] that are generally estimated during the solution process. It is then crucial to ensure that the solution components of interest can be computed with satisfactory accuracy. The main goal of this paper is to formalize this problem in terms of a condition number and to describe practical methods to compute or estimate this quantity. Note that as far as the sensitivity of a subset of the solution components is concerned, the matrix L is a projection whose columns consist of vectors of the canonical basis of R n .
We prove duality results for adjoint operators and product norms in the framework of Euclidean spaces. We show how these results can be used to derive condition numbers especially when perturbations on data are measured componentwise relatively to the original data. We apply this technique to obtain formulas for componentwise and mixed condition numbers for a linear function of a linear least squares solution. These expressions are closed when perturbations of the solution are measured using a componentwise norm or the infinity norm and we get an upper bound for the Euclidean norm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.