Area and power constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spatially local data common in many emerging workloads, while reducing power consumption due to data movement. However, in-SRAM computing faces many challenges, including integration into the existing architecture, arithmetic operation support, data corruption at high operating frequencies, inability to run at low voltages, and low area density. To meet these challenges, this work introduces BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an in-SRAM computing architecture that utilizes local wordline groups to perform computations at a frequency 2.8x higher than state-of-the-art in-SRAM computing architectures. BLADE is integrated into the cache hierarchy of low-voltage edge devices, and simulated and benchmarked at the transistor, architecture, and software abstraction levels. Experimental results demonstrate performance/energy gains over an equivalent NEON accelerated processor for a variety of edge device workloads, namely, cryptography (4x performance gain/6x energy reduction), video encoding (6x/2x), and convolutional neural networks (3x/1.5x), while maintaining the highest frequency/energy ratio (up to 2.2Ghz@1V) of any conventional in-SRAM computing architecture, and a low area overhead of less than 8%.
While standalone Flash memories (NAND) are facing their physical limitations, the emergence of resistive switching memories (RRAM) is seen as a solution for high density, low cost and low energy NAND replacement candidate. However, it has been shown that deeply scaled, high density RRAM architectures, such as crosspoint, suffer of voltage drop effects (IR drop) in metal lines, periphery overhead and metal line charging time due to injected current during programming operations and sneaking currents through unselected bitcells. In this work, we first propose several innovative models for IRdrop, periphery overhead and array-line charging time accounting for in-array multiple bit-write operation. Then, we introduce a new methodology for crosspoint memory design to determine IRdrop, periphery overhead and timing associated with the optimal characteristics of 1 selector-1 resistance (1S1R) device. We apply the proposed methodology to various half metal pitch memory technology nodes (from 50nm to 15nm) and to several written word sizes (from 1 to 32 bits). We show that for 1 bit programmed per array, the RRAM programming current has to be lower than 30µA and the selector leakage current lower than 10nA and that limitations increase as soon as multiple bits are written simultaneously in the same array. This, suggests massively parallel multi-bank write of a small number of bits per array, as the best solution for the RRAM memories to be competitive with NAND memories
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.