Nandita Vijaykumar scite author profile

Abstract-Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is not switching. Estimates suggest a five-fold increase in leakage energy in every future generation. In modern microarchitectures, much of the leakage energy is dissipated in large on-chip cache memory structures with high transistor densities. While cache utilization varies both within and across applications, modern cache designs are fixed in size resulting in transistor leakage inefficiencies. This paper explores an integrated architectural and circuit-level approach to reducing leakage energy in instruction caches (i-caches). At the architecture level, we propose the Dynamically ResIzable i-cache (DRI i-cache), a novel i-cache design that dynamically resizes and adapts to an application's required size. At the circuit-level, we use gated-, a novel mechanism that effectively turns off the supply voltage to, and eliminates leakage in, the SRAM cells in a DRI i-cache's unused sections. Architectural and circuit-level simulation results indicate that a DRI i-cache successfully and robustly exploits the cache size variability both within and across applications. Compared to a conventional i-cache using an aggressively-scaled threshold voltage a 64 K DRI i-cache reduces on average both the leakage energy-delay product and cache size by 62%, with less than 4% impact on execution time. Our results also indicate that a wide NMOS dual-gatedtransistor with a charge pump offers the best gating implementation and virtually eliminates leakage energy with minimal increase in an SRAM cell read time area as compared to an i-cache with an aggressively-scaled threshold voltage.

show abstract

Zorua: A holistic approach to resource virtualization in GPUs

Vijaykumar

Hsieh

Pekhimenko

et al. 2016

View full text Add to dashboard Cite

QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips

Olgun

Patel²,

Yağlıkçı³

et al. 2021

View full text Add to dashboard Cite

True random number generators (TRNG) sample random physical processes to create large amounts of random numbers for various use cases, including security-critical cryptographic primitives, scienti c simulations, machine learning applications, and even recreational entertainment. Unfortunately, not every computing system is equipped with dedicated TRNG hardware, limiting the application space and security guarantees for such systems. To open the application space and enable security guarantees for the overwhelming majority of computing systems that do not necessarily have dedicated TRNG hardware (e.g., processing-in-memory systems), we develop QUAC-TRNG, a new high-throughput TRNG that can be fully implemented in commodity DRAM chips, which are key components in most modern systems.QUAC-TRNG exploits the new observation that a carefullyengineered sequence of DRAM commands activates four consecutive DRAM rows in rapid succession. This QUadruple ACtivation (QUAC) causes the bitline sense ampli ers to nondeterministically converge to random values when we activate four rows that store con icting data because the net deviation in bitline voltage fails to meet reliable sensing margins.We experimentally demonstrate that QUAC reliably generates random values across 136 commodity DDR4 DRAM chips from one major DRAM manufacturer. We describe how to develop an e ective TRNG (QUAC-TRNG) based on QUAC. We evaluate the quality of our TRNG using the commonly-used NIST statistical test suite for randomness and nd that QUAC-TRNG successfully passes each test. Our experimental evaluations show that QUAC-TRNG reliably generates true random numbers with a throughput of 3.44 Gb/s (per DRAM channel), outperforming the state-of-the-art DRAM-based TRNG by 15.08× and 1.41× for basic and throughput-optimized versions, respectively. We show that QUAC-TRNG utilizes DRAM bandwidth better than the state-of-the-art, achieving up to 2.03× the throughput of a throughput-optimized baseline when scaling bus frequencies to 12 GT/s.

show abstract

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Oliveira¹,

Gómez-Luna

Orosa

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Prior NDP works investigate the root causes of data movement bottlenecks using different profiling methodologies and tools. However, there is still a lack of understanding about the key metrics that can identify different data movement bottlenecks and their relation to traditional and emerging data movement mitigation mechanisms. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques (e.g., caching and prefetching) to more memory-centric techniques (e.g., NDP), thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https: //github.com/CMU-SAFARI/DAMOV.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.