CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

Chen, Zhiyu; Yu, Zhanghao; Jin, Qing; He, Yan; Wang, Jingyu; Lin, Sheng; Dai, Li; Wang, Yanzhi; Yang, Kaiyuan

doi:10.1109/jssc.2021.3056447

Cited by 59 publications

(28 citation statements)

References 33 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In cryptographic computing, the requirement on PIM is different from that of machine learning applications because of its zero tolerance to compute errors. Thus, bit-parallel and bitserial operations are more suitable than the lossy computing mechanisms in current, charge, or voltage domains [17], [18], [19], [20]. Recent Digital in-SRAM architectures [21], [22] have been designed to compute with full precision and high parallelism.…”

Section: Processing In Memory For Cryptographic Accelerationmentioning

confidence: 99%

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

Li¹,

Akhil²,

Yang³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Lattice-based cryptography (LBC) exploiting Learning with Errors (LWE) problems is a promising candidate for post-quantum cryptography. Number theoretic transform (NTT) is the latency-and energy-dominant process in the computation of LWE problems. This paper presents a compact and efficient in-MEmory NTT accelerator, named MeNTT, which explores optimized computation in and near a 6T SRAM array. Specifically-designed peripherals enable fast and efficient modular operations. Moreover, a novel mapping strategy reduces the data flow between NTT stages into a unique pattern, which greatly simplifies the routing among processing units (i.e., SRAM column in this work), reducing energy and area overheads. The accelerator achieves significant latency and energy reductions over prior arts.

show abstract

Section: Processing In Memory For Cryptographic Accelerationmentioning

confidence: 99%

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

Li¹,

Akhil²,

Yang³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Early IMC attempts [31] employed few Analog to Digital Converters(ADCs) multiplexed across many MACs [17], [22], [31]. The number of ADCs in more recent approaches is higher [24], [27], [42], aiming to maximise speed through parallelism in the computation. As a result of memory folding we get that bit-lines(BL) are multiplexed to data-line(DL), making it so that one MAC is generated per DL at any given time.…”

Section: Introductionmentioning

confidence: 99%

Input Conditioned Subranging and Skewed Quantisation of MACs in IMC

Sundar¹,

Viraraghavan²,

Vijayakumar³

2022

Preprint

View full text Add to dashboard Cite

<p>In-Memory Computation(IMC) of Neural-Network(NN) inference is done by performing the Multiply-ACcumulate(MAC) operation in the analog domain. Parallelly digitising MAC voltages by fitting Analog to Digital Converters(ADCs) within dense memory-pitches is a fundamental challenge for IMC engines. IMC works thus far rely on clipping the MAC-PDF to reduce the dynamic range, reducing the per data-line(DL) ADC precision requirement. In this work, we show that the per-DL ADC precision can be reduced even further by focusing on quantising the input Conditioned MAC-PDF(CMPDF), which spans a sub-range in the total MAC-PDF. We demonstrate on hardware a technique to locate the CMPDF in one-shot by tracking its mean. We show that quantisation levels about the CMPDF mean can be skewed to only span the portion of CMPDF that yields positive ReLU inputs, provided MACs are implemented as complete sums. Compared to symmetrically spanning CMPDF, this requires 20% to 40% fewer references at iso-accuracy for the investigated neural network layers. Hardware measured results for Fully-Connected NN inference on MNIST yielded < 1% accuracy drop when MAC-voltages were quantised with 4 bit references about the CMPDF mean as compared to full-range 6 bit ADC sensing the MAC-voltages. MATLAB evaluation using 3.5 to 4.2 bit CMPDF quantisation for MNIST-FCNN, CIFAR-10 Resnet-20 and VGG-11 inference yielded < 1% accuracy drop as compared to full-range 6 to 7 bit MAC quantisation. </p>

show abstract

“…Many studies have implemented PIM based on static random-access memory (SRAM) due to its logic compatibility and high operation speed [1][2][3][4][5][6][7]. However, SRAM-based PIMs have the limitations of low bit density and large silicon area [1,2].…”

Section: Introductionmentioning

confidence: 99%

“…Previous eDRAM structures extended the tention time by employing an additional capacitor in the gain cell. However, the multip accumulate (MAC) operation in an analog PIM usually requires metal-oxide-me (MOM) coupling capacitors [3][4][5], and a sufficiently large capacitor cannot be employ in the gain cell because of the area constraint. In addition, for the same gain cell archit ture, process scaling to the ultra-deep submicron scale further reduces the retention tim As shown in Figure 2, for the same two-transistor (2T) gain cell structure [20,21], simulated retention time decreases by approximately 300 times as the channel length creases from 180 to 28 nm owing to an increased leakage current and a reduced paras capacitance.…”

Section: Introductionmentioning

confidence: 99%

“…Previous eDRAM structures extended the retention time by employing an additional capacitor in the gain cell. However, the multiplyaccumulate (MAC) operation in an analog PIM usually requires metal-oxide-metal (MOM) coupling capacitors [3][4][5], and a sufficiently large capacitor cannot be employed in the gain cell because of the area constraint. In addition, for the same gain cell architecture, process scaling to the ultra-deep submicron scale further reduces the retention time.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pseudo-Static Gain Cell of Embedded DRAM for Processing-in-Memory in Intelligent IoT Sensor Nodes

Kim

Park

2022

Sensors

View full text Add to dashboard Cite

This paper presents a pseudo-static gain cell (PS-GC) with extended retention time for an embedded dynamic random-access memory (eDRAM) macro for analog processing-in-memory (PIM). The proposed eDRAM cell consists of a two-transistor (2T) gain cell with a pseudo-static leakage compensation that maintains stored data without charge loss issue. Hence, the PS-GC can offer unlimited retention time in the same manner as static RAM (SRAM). Due to the extended retention time, bulky capacitors in conventional eDRAM are no longer needed, thereby, improving the area efficiency of eDRAM-based analog PIMs. The active leakage compensation of the PS-GC can effectively hold stored data even in a deep-submicron process that show significant leakage current. Therefore, the PS-GC can accelerate write-access time and read-access time without concern of increased leakage current. The proposed gain cell and its 64 × 64 eDRAM macro were implemented in a 28 nm CMOS process. The bitcell of the proposed gain cell has 0.79- and 0.58-times the area of those of 6T SRAM and 8T STAM, respectively. The post-layout simulation results demonstrate that the eDRAM maintains the pseudo-static operation with unlimited retention time successfully under wide range variations of process, voltage and temperature. At the operating frequency of 667 MHz, the eDRAM macro achieved an operating voltage range from 0.9 to 1.2 V and operating temperature range from −25 to 85 °C regardless of the process variation. The post-layout simulated write-access time and read-access time were below 0.3 ns at an operating temperature of 85 °C. The PS-GC consumes a static power of 2.2 nW/bit at an operating temperature of 25 °C.

show abstract

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

Cited by 59 publications

References 33 publications

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

Input Conditioned Subranging and Skewed Quantisation of MACs in IMC

Pseudo-Static Gain Cell of Embedded DRAM for Processing-in-Memory in Intelligent IoT Sensor Nodes

Contact Info

Product

Resources

About