Jakob Siegel scite author profile

Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Especially tuning the program for the GPU memory hierarchy whose organization and performance implications are radically different from those of general purpose CPUs; and optimizing programs at the instruction-level for the GPU. In this paper we analyze different approaches for optimizing the memory usage and access patterns for GPUs and propose a class of memory layout optimizations that can take full advantage of the unique memory hierarchy of NVIDIA CUDA. Furthermore, we analyze some classical optimization techniques and how they effect the performance on a GPU. We used the Gravit gravity simulator to demonstrate these optimizations. The final optimized GPU version achieves a 87ϫ speedup compared to the original CPU version. Almost 30% of this speedup are direct results of the optimizations discussed in this paper.

show abstract

An empirically tuned 2D and 3D FFT library on CUDA GPU

Siegel

2010

View full text Add to dashboard Cite

Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems

Siegel

Villa

Krishnamoorthy

et al. 2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jakob Siegel

A control-structure splitting optimization for GPGPU

Using GPUs to compute large out-of-card FFTs

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

An empirically tuned 2D and 3D FFT library on CUDA GPU

Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems

Contact Info

Product

Resources

About