Minesh Patel scite author profile

Processing-in-memory (PIM) architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages. We propose LazyPIM, a new hardware cache coherence mechanism designed specifically for PIM. LazyPIM uses a combination of speculative cache coherence and compressed coherence signatures to greatly reduce the overhead of keeping PIM coherent with the processor. We find that LazyPIM improves average performance across a range of PIM applications by 49.1% over the best prior approach, coming within 5.5% of an ideal PIM mechanism.

show abstract

D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput

Kim

Patel

Hassan

et al. 2019

105

View full text Add to dashboard Cite

We propose a new DRAM-based true random number generator (TRNG) that leverages DRAM cells as an entropy source.The key idea is to intentionally violate the DRAM access timing parameters and use the resulting errors as the source of randomness. Our technique speci cally decreases the DRAM row activation latency (timing parameter t RCD ) below manufacturerrecommended speci cations, to induce read errors, or activation failures, that exhibit true random behavior. We then aggregate the resulting data from multiple cells to obtain a TRNG capable of providing a high throughput of random numbers at low latency.To demonstrate that our TRNG design is viable using commodity DRAM chips, we rigorously characterize the behavior of activation failures in 282 state-of-the-art LPDDR4 devices from three major DRAM manufacturers. We verify our observations using four additional DDR3 DRAM devices from the same manufacturers. Our results show that many cells in each device produce random data that remains robust over both time and temperature variation. We use our observations to develop D-RaNGe, a methodology for extracting true random numbers from commodity DRAM devices with high throughput and low latency by deliberately violating the read access timing parameters. We evaluate the quality of our TRNG using the commonly-used NIST statistical test suite for randomness and nd that D-RaNGe: 1) successfully passes each test, and 2) generates true random numbers with over two orders of magnitude higher throughput than the previous highest-throughput DRAM-based TRNG.

show abstract

Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines

Kim

Patel

Hassan

et al. 2018

View full text Add to dashboard Cite

Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices

Patel

Kim

Hassan

et al. 2019

View full text Add to dashboard Cite

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers

Cojocar

Kim

Patel

et al. 2020

View full text Add to dashboard Cite

Cloud providers are concerned that Rowhammer poses a potentially critical threat to their servers, yet today they lack a systematic way to test whether the DRAM used in their servers is vulnerable to Rowhammer attacks. This paper presents an endto-end methodology to determine if cloud servers are susceptible to these attacks. With our methodology, a cloud provider can construct worst-case testing conditions for DRAM.We apply our methodology to three classes of servers from a major cloud provider. Our findings show that none of the CPU instruction sequences used in prior work to mount Rowhammer attacks create worst-case DRAM testing conditions. To address this limitation, we develop an instruction sequence that leverages microarchitectural side-effects to "hammer" DRAM at a near-optimal rate on modern Intel Skylake and Cascade Lake platforms. We also design a DDR4 fault injector that can reverse engineer row adjacency for any DDR4 DIMM. When applied to our cloud provider's DIMMs, we find that DRAM rows do not always follow a linear map.

show abstract

SIMDRAM: a framework for bit-serial SIMD processing using DRAM

Hajinazar

Oliveira

Gregorio

et al. 2021

View full text Add to dashboard Cite

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism to support the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes.We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Our evaluations using a single DRAM bank show that (1) over 16 operations, SIMDRAM provides 2.0× the throughput and 2.6× the energy efficiency of Ambit, a state-of-the-art processing-using-DRAM mechanism; (2) over seven real-world applications, SIM-DRAM provides 2.5× the performance of Ambit. Using 16 DRAM banks, SIMDRAM provides (1) 88× and 5.8× the throughput, and 257× and 31× the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21× and 2.1× the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Minesh Patel

Revisiting RowHammer: An Experimental Analysis of Modern DRAM Devices and Mitigation Techniques

The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory

D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput

Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines

Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices

Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers

SIMDRAM: a framework for bit-serial SIMD processing using DRAM

Contact Info

Product

Resources

About