Ashutosh Pattnaik scite author profile

As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of multiple applications in GPUs becomes essential for unlocking their full potential. However, unlike CPUs, multiapplication execution in GPUs is little explored. In this paper, we study the memory system of GPUs in a concurrently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-perkilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of applications. We characterize the memory interference of applications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effective in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.

show abstract

NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNs

Singh

Sarma

Jao

et al. 2020

View full text Add to dashboard Cite

Exploiting Core Criticality for Enhanced GPU Performance

Jog

Kayıran

Pattnaik

et al. 2016

SIGMETRICS Perform. Eval. Rev.

View full text Add to dashboard Cite

Modern memory access schedulers employed in GPUs typically optimize for memory throughput. They implicitly assume that all requests from different cores are equally important. However, we show that during the execution of a subset of CUDA applications, different cores can have different amounts of tolerance to latency. In particular, cores with a larger fraction of warps waiting for data to come back from DRAM are less likely to tolerate the latency of an outstanding memory request. Requests from such cores are more critical than requests from others. Based on this observation, this paper introduces a new memory scheduler, called (C)ritica(L)ity (A)ware (M)emory (S)cheduler (CLAMS), which takes into account the latency-tolerance of the cores that generate memory requests. The key idea is to use the fraction of critical requests in the memory request buffer to switch between scheduling policies optimized for criticality and locality. If this fraction is below a threshold, CLAMS prioritizes critical requests to ensure cores that cannot tolerate latency are serviced faster. Otherwise, CLAMS optimizes for locality, anticipating that there are too many critical requests and prioritizing one over another would not significantly benefit performance. We first present a core-criticality estimation mechanism for determining critical cores and requests, and then discuss issues related to finding a balance between criticality and locality in the memory scheduler. We progressively devise three variants of CLAMS, and show that the Dynamic CLAMS provides significantly higher performance, across a variety of workloads, than the commonly-employed GPU memory schedulers optimized solely for locality. The results indicate that a GPU memory system that considers both core criticality and DRAM access locality can provide significant improvement in performance.

show abstract

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Tang

Pattnaik

Jiang

et al. 2017

View full text Add to dashboard Cite

Opportunistic computing in GPU architectures

Pattnaik

Tang

Kayıran

et al. 2019

View full text Add to dashboard Cite

A New and Efficient Method for Removal of High Density Salt and Pepper Noise Through Cascade Decision based Filtering Algorithm

Pattnaik

Agarwal

Chand

2012

Procedia Technology

View full text Add to dashboard Cite

Peripheral blood T lymphocytosis in thymoma: an insight into immunobiology

Mishra

Padhi

Adhya

et al. 2020

J Cancer Res Clin Oncol

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.