Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems

Yan, Zheyu; Hu, Xiaobo Sharon; Shi, Yiyu

doi:10.1145/3508352.3549360

Cited by 12 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By introducing the mutation to the children generation it is possible to obtain a better assignment strategy (Lines 13-18). After evaluating all the design points, it will record the throughput optimal point under latency constraints and update the new population by selecting the top solutions (Lines [19][20][21][22][23][24].…”

Section: Ssr Design Space Explorationmentioning

confidence: 99%

“…If a design requires higher throughput which can be achieved by batching more data, the system would have to sacrifice latency. While users can only explore latency throughput tradeoff by changing the batch size when using the off-the-shelf deep learning framework on GPUs, FPGA accelerators [12,13,14,15,16] and other tiled accelerators [17,18,19,20,21,22,23,24] provide more flexibility and users have a larger design space to explore the latency throughput tradeoff.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

Zhuang,

Yang,

et al. 2024

Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latency. In this work, we first systematically investigate two execution models: (1) sequentially (temporally) launch one monolithic accelerator, and (2) spatially launch multiple accelerators. From the observations, we find that there is a latency throughput tradeoff between these two execution models, and combining these two strategies together can give us a more efficient latency throughput Pareto front. To achieve this, we propose spatial sequential architecture (SSR) and SSR design automation framework to explore both strategies together when deploying deep learning inference. We use the 7nm AMD Versal ACAP VCK190 board to implement SSR accelerators for four end-to-end transformer-based deep learning models. SSR achieves average throughput gains of 2.53x, 35.71x, and 14.20x under different batch sizes compared to the 8nm Nvidia GPU A10G, 16nm AMD FPGAs ZCU102, and U250. The average energy efficiency gains are 8.51x, 6.75x, and 21.22x, respectively. Compared with the sequential-only solution and spatial-only solution on VCK190, our spatial-sequential-hybrid solutions achieve higher throughput under the same latency requirement and lower latency under the same throughput requirement. We also use SSR analytical models to demonstrate how to use SSR to optimize solutions on other computing platforms, e.g., 14nm Intel Stratix 10 NX. CCS CONCEPTS• Computer systems organization → Heterogeneous (hybrid) systems; • Hardware → Hardware-software codesign.

show abstract

Section: Ssr Design Space Explorationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

Zhuang,

Yang,

et al. 2024

Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

show abstract

“…Among all the non-conventional computing systems, in-memory computing seems like the solution to the von Neumann bottleneck. In-memory computing is a technique that runs arithmetic and logic operations entirely in computer memory, as shown in Figure 1.2a [9][10][11].…”

Section: Non-conventional Computing and Emerging Memory Devicesmentioning

confidence: 99%

Frequency-dependent redox-based 1S1R synapse with a short-term plasticity TIO2-based exponential selector

Chee

View full text Add to dashboard Cite

Short-term plasticity plays a crucial role in the hardware implementation of artificial neural networks (ANN) as it enables temporal information processing capability. However, the short-term plasticity feature is rather challenging to reproduce from a single non-volatile resistive random-access memory (RRAM) component due to its requirement for a certain degree of volatility. Nonetheless, if the selector in one selector-one RRAM (1S1R) integration demonstrates short-term plasticity, it enables the 1S1R device to perform temporal information processing even in the absence of short-term plasticity in RRAM.In this thesis, an exponential selector of Pt/TiO2/Pt structure is introduced to demonstrate the short-term plasticity feature, which is shown to be dependent on the electrodeoxide interface through plasma treatment, and a microscopic model is proposed to explain the observed feature. Thereafter, the short-term plasticity and nonlinearity of the exponential selector are tuned by modulating the oxygen vacancy defects in the TiO2 layer. As the concentration of oxygen vacancy defects increases, the dominant conduction mechanism of the exponential selector transitions from Schottky emission to Poole-Frenkel emission.Additionally, a 1S1R synaptic device is developed based on the Pt/TiO2/Pt exponential selector and a Pt/HfO2/Ti RRAM structure. The Pt/TiO2/Pt selector with short-term plasticity is integrated not only to suppress the sneak current but also to enable the temporal information processing feature, while the Pt/HfO2/Ti RRAM structure enables the long-term memory capability of the 1S1R synapse. Frequency-dependent multilevel switching is experimentally demonstrated in the 1S1R synaptic device, exhibiting the capability of temporal information processing. Furthermore, a 2x2 crossbar array based on the developed 1S1R device is characterised under the worst-case scenario, demonstrating the potential of using this 1S1R synaptic device in the hardware implementation of ANN.

show abstract

Hardware design and the fairness of a neural network

Guo,

Yan,

et al. 2024

Nat Electron

View full text Add to dashboard Cite

Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems

Cited by 12 publications

References 25 publications

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

Frequency-dependent redox-based 1S1R synapse with a short-term plasticity TIO2-based exponential selector

Hardware design and the fairness of a neural network

Contact Info

Product

Resources

About