The Effect of IFRS Adoption on Corporate Performance of Listed Firms in Nigeria

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.

show abstract

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Cevahir

Nukada

Matsuoka

2010

Comput Sci Res Dev

View full text Add to dashboard Cite

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most timeconsuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPUextended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPUextended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraphpartitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.A. Cevahir ( ) · A. Nukada · S. Matsuoka

show abstract

Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation

Cevahir

Aykanat

Türk

et al. 2011

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.

show abstract

GPU-Enabled High Performance Online Visual Search with High Accuracy

Cevahir

Torii

2012

View full text Add to dashboard Cite

A Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation

Cevahir

Aykanat

Türk

et al.

View full text Add to dashboard Cite

Abstract. A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the Web, is quite significant compared to the duration of the PageRank computation. To alleviate this problem, we utilize the information that sites form a natural clustering on pages to propose a site-based hypergraph-partitioning technique, which does not degrade the quality of the parallelization. We also propose an efficient parallelization scheme for matrix-vector multiplies in order to avoid possible communication due to the pages without in-links. Experimental results on realistic datasets validate the effectiveness of the proposed models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ali Cevahir

Fast Conjugate Gradients with Multiple GPUs

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation

GPU-Enabled High Performance Online Visual Search with High Accuracy

A Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation

Contact Info

Product

Resources

About