A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

Si, Xin; Liu, Rui; Yu, Shimeng; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Li, Qiang; Chang, Meng‐Fan; Chen, Jiajing; Tu, Yung-Ning; Huang, Wei-Hsing; Wang, Jinghong; Chiu, Yen-Cheng; Wei, Weichen; Wu, Shaoen; Sun, Xiaoyu

doi:10.1109/jssc.2019.2952773

Cited by 133 publications

(58 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This value can also be obtained via (16) and corresponds to the final result of the multiplication operation. Had the sign been negative, the negative precharge voltage −V pre would have been used, and V C,out [10] would have been −15/32 V.…”

Section: Numerical Examplementioning

confidence: 99%

“…Static random access memory (SRAM) can in a similar way to PCM and ReRAM be enhanced with IMC capabilities. A native approach consists of using the ON-resistance R DS,ON of the pull-down transistors to convert the digital bits stored in the SRAM cells into a proportional current value [14]- [16]. However, serious practical limitations of R DS,ON -based IMC in SRAM arise from cell-to-cell variations, as well as from the risk of unintended overwriting of stored bits.…”

mentioning

confidence: 99%

See 1 more Smart Citation

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

Khaddam-Aljameh

Francese

Benini

et al. 2021

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Section: Numerical Examplementioning

confidence: 99%

mentioning

confidence: 99%

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

Khaddam-Aljameh

Francese

Benini

et al. 2021

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

“…To calculate the weight gradient of the layer n, the activations of the layer n-1 and the errors or the layer n are required as shown in (3). The activations need to be read from the array first and computed with errors, then the results are stored.…”

Section: Weight Gradient Calculation Processmentioning

confidence: 99%

“…Recently, various types of CIM architectures have been investigated using memory cells as a synaptic device for weighted sum or vector-matrix multiplication (VMM). CIM accelerators based on the mainstream device technologies such as SRAM [3][4], NOR Flash [5] and NAND Flash [6][7][8][9] have been proposed and verified in silicon. Furthermore, the emerging nonvolatile (NVM) memories such as RRAM [10][11][12][13][14][15] and PCM [16][17] have been considered as strong candidates due to the multilevel capability (over SRAM) and lower programming voltage (over Flash).…”

Section: Introductionmentioning

confidence: 99%

Ferroelectric Field-Effect Transistor-Based 3-D NAND Architecture for Energy-Efficient on-Chip Training Accelerator

Shim

2021

IEEE J. Explor. Solid-State Comput. Devices Circuits

Self Cite

View full text Add to dashboard Cite

Different from the deep neural network (DNN) inference process, the training process produces huge amount of intermediate data to compute the new weights of the network. Generally, on-chip global buffer (e.g. SRAM cache) has limited capacity because of its low memory density, therefore the off-chip DRAM access is inevitable during the training sequences. In this work, a novel ferroelectric field effect transistor (FeFET) based 3D NAND architecture for on-chip training accelerator is proposed. The reduced peripheral circuit overheads owing to the low operation voltage of FeFET device and ultra-high density of 3D NAND architecture enables storing and computing all the intermediate data on chip during the training process. We present a custom design of 108 Gb chip with 59.91 mm 2 area with 45 % array efficiency. The relevant data mapping schemes for weights/activations/errors that are compatible to the 3D NAND architecture are investigated. The training performance was explored while ResNet-18 model is trained on this architecture with ImageNet dataset by 8-bit precision. Thanks to the minimized off-chip memory access, 7.76 TOPS/W of energy efficiency was achieved for 8-bit on-chip training.

show abstract

“…In DNN, many operations need to read or write data/coefficients from/to memory module (SRAM, ReRAM, or DRAM), which cost a lot of power consumption. In recent years, in order to embed computational functions in the memory cell array and its peripheral circuit in mixed-signal domain, several IMC and near-memory computing (NMC) hardware architectures [3,6,10,14] are proposed to increase the energy efficiency and the parallelism. However, implementing a suitable DNN for different AI edge applications using IMC in the early design stage (system level architecture design) is very important.…”

Section: Circuit Design Of Imc Ai Edge Devicesmentioning

confidence: 99%

On EDA solutions for reconfigurable memory-centric AI edge applications

Chen

Chang

et al. 2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

Cited by 133 publications

References 28 publications

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

Ferroelectric Field-Effect Transistor-Based 3-D NAND Architecture for Energy-Efficient on-Chip Training Accelerator

On EDA solutions for reconfigurable memory-centric AI edge applications

Contact Info

Product

Resources

About