Sana Damani scite author profile

Sana Damani

5Publications

0Citation Statements Received

65Citation Statements Given

How they've been cited

How they cite others

Affiliations

Nvidia (United States), Georgia Institute of Technology

Publications

Order By: Most citations

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Jeong¹,

Damani²,

Rajeshkumar³

et al. 2023

Preprint

View full text Add to dashboard Cite

Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers. This work presents VEGETA, a set of ISA and microarchitecture extensions over dense matrix engines to support flexible structured sparsity for CPUs, enabling programmable support for diverse DL models with varying degrees of sparsity. Compared to the state-of-theart (SOTA) dense matrix engine in CPUs, a VEGETA engine provides 1.09×, 2.20×, 3.74×, and 3.28× speed-ups when running 4:4 (dense), 2:4, 1:4, and unstructured (95%) sparse DNN layers.

show abstract

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications

Ziad

Damani

Jaleel

et al. 2023

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

CUDA, OpenCL, and OpenACC are the primary means of writing general-purpose software for NVIDIA GPUs, all of which are subject to the same well-documented memory safety vulnerabilities currently plaguing software written in C and C++. One can argue that the GPU execution environment makes software development more error prone. Unlike C and C++, CUDA features multiple, distinct memory spaces to map to the GPU’s unique memory hierarchy, and a typical CUDA program has thousands of concurrently executing threads. Furthermore, the CUDA platform has fewer guardrails than CPU platforms that have been forced to incrementally adjust to a barrage of security attacks. Unfortunately, the peculiarities of the GPU make it difficult to directly port memory safety solutions from the CPU space. This paper presents cuCatch, a new memory safety error detection tool designed specifically for the CUDA programming model. cuCatch combines optimized compiler instrumentation with driver support to implement a novel algorithm for catching spatial and temporal memory safety errors with low performance overheads. Our experimental results on a wide set of GPU applications show that cuCatch incurs a 19% runtime slowdown on average, which is orders of magnitude faster than state-of-the-art debugging tools on GPUs. Moreover, our quantitative evaluation demonstrates cuCatch’s higher error detection coverage compared to prior memory safety tools. The combination of high error detection coverage and low runtime overheads makes cuCatch an ideal candidate for accelerating memory safety debugging for GPU applications.

show abstract

Speculative reconvergence for improved SIMT efficiency

Damani

Johnson

Stephenson

et al. 2020

View full text Add to dashboard Cite

Common Subexpression Convergence: A New Code Optimization for SIMT Processors

Damani

Sarkar

2021

View full text Add to dashboard Cite

Memory access scheduling to reduce thread migrations

Damani

Barua

Sarkar

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sana Damani

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications

Speculative reconvergence for improved SIMT efficiency

Common Subexpression Convergence: A New Code Optimization for SIMT Processors

Memory access scheduling to reduce thread migrations

Contact Info

Product

Resources

About