Aravind Sukumaran-Rajam scite author profile

Tiling is a key technique for data locality optimization and is widely used in high-performance implementations of dense matrix-matrix multiplication for multicore/manycore CPUs and GPUs. However, the irregular and matrix-dependent data access pattern of sparse matrix multiplication makes it challenging to use tiling to enhance data reuse. In this paper, we devise an adaptive tiling strategy and apply it to enhance the performance of two primitives: SpMM (product of sparse matrix and dense matrix) and SDDMM (sampled dense-dense matrix multiplication). In contrast to studies that have resorted to non-standard sparse-matrix representations to enhance performance, we use the standard Compressed Sparse Row (CSR) representation, within which intra-row reordering is performed to enable adaptive tiling. Experimental evaluation using an extensive set of matrices from the Sparse Suite collection demonstrates significant performance improvement over currently available state-ofthe-art alternatives.

show abstract

Efficient sparse-matrix multi-vector product on GPUs

Hong

Sukumaran-Rajam

Bandyopadhyay

et al. 2018

View full text Add to dashboard Cite

An efficient mixed-mode representation of sparse tensors

Nisa

Sukumaran-Rajam

et al. 2019

View full text Add to dashboard Cite

Load-Balanced Sparse MTTKRP on GPUs

Nisa

Sukumaran-Rajam

et al. 2019

View full text Add to dashboard Cite

Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and storage requirements. We begin by identifying the performance bottlenecks in directly extending the state-ofthe-art CSF (compressed sparse fiber) format from CPUs to GPUs. A significant challenge with GPUs compared to multicore CPUs is that of utilizing the much greater degree of parallelism in a load-balanced fashion for irregular computations like sparse MTTKRP. To address this issue, we develop a new storage-efficient representation for tensors that enables highperformance, load-balanced execution of MTTKRP on GPUs. A GPU implementation of sparse MTTKRP using the new sparse tensor representation is shown to outperform all currently known parallel sparse CPU and GPU MTTKRP implementations.

show abstract

Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations

et al. 2018

View full text Add to dashboard Cite

Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs

Nisa

Siegel

Sukumaran-Rajam

et al. 2018

View full text Add to dashboard Cite

The Polyhedral Model of Nonlinear Loops

Sukumaran-Rajam

Clauss

2015

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

Register optimizations for stencils on GPUs

Rawat

Rastello

Sukumaran-Rajam

et al. 2018

View full text Add to dashboard Cite

The recent advent of compute-intensive GPU architecture has allowed application developers to explore high-order 3D stencils for better computational accuracy. A common optimization strategy for such stencils is to expose sufficient data reuse by means such as loop unrolling, with the expectation of register-level reuse. However, the resulting code is often highly constrained by register pressure. While current state-of-the-art register allocators are satisfactory for most applications, they are unable to effectively manage register pressure for such complex high-order stencils, resulting in sub-optimal code with a large number of register spills. In this paper, we develop a statement reordering framework that models stencil computations as a DAG of trees with shared leaves, and adapts an optimal scheduling algorithm for minimizing register usage for expression trees. The effectiveness of the approach is demonstrated through experimental results on a range of stencils extracted from application codes. for (i=2; i=-2; ii-) for (jj=-2; jj<=2; jj++) out[i][j] += in[i+ii][j+jj] * w[ii+2][jj+2]; }

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.