A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays

Rasch, Malte J.; Moreda, Diego; Gokmen, Tayfun; Gallo, Manuel Le; Carta, Fabio; Goldberg, C.; Maghraoui, Kaoutar El; Sebastian, Abu; Narayanan, V.

doi:10.1109/aicas51828.2021.9458494

Cited by 76 publications

(29 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then we apply and compare them to SGD and DNN training with different material and reference offsets settings. For simulations, we used the PYTORCHbased (Paszke et al, 2019) open source toolkit 5 IBM Analog Hardware Acceleration Kit (AIHWKIT) (Rasch et al, 2021).…”

Section: Resultsmentioning

confidence: 99%

“…Using resistive crossbar arrays to compute an MVM in-memory has been suggested early on (Steinbuch, 1961), and multiple prototype chips where MVMs of DNNs during inference are accelerated have been described (Wan et al, 2022;Khaddam-Aljameh et al, 2021;Xue et al, 2021;Fick et al, 2022;Narayanan et al, 2021). In principle, in all these solutions, the weights of a linear layer are stored in a crossbar array of tunable conductances, inputs are encoded e.g.…”

Section: Analog Matrix-vector Multiplicationmentioning

confidence: 99%

“…In-memory computing with resistive crossbar arrays could greatly accelerate deep-learning workloads, since energy efficient acceleration can be achieved by implementing ubiquitous matrix-vector multiplications (MVMs) using resistive elements and fundamental physics (Kirchhoff's and Ohm's laws) (Sebastian et al, 2020;Burr et al, 2017;Haensch et al, 2019;Yang et al, 2013;Sze et al, 2017). While most prototype chip building efforts to-date have been focused on accelerating the inference phase of pretrained DNNs (Wan et al, 2022;Khaddam-Aljameh et al, 2021;Xue et al, 2021;Fick et al, 2022;Narayanan et al, 2021;Ambrogio et al, 2018;Yao et al, 2020), in terms of raw compute requirements, the training phase typically is orders of magnitude more expensive than inference, and thus would in principle have a greater need for hardware acceleration using in-memory compute (Gokmen & Vlasov, 2016). However, in-memory training using non-volatile memory elements has been challenging, in particular because of the asymmetric and non-ideal switching of the memory devices, and the much greater precision requirements during gradient update (see e.g.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Fast offset corrected in-memory training

Rasch¹,

Carta²,

Fagbohungbe³

et al. 2023

Preprint

View full text Add to dashboard Cite

In-memory computing with resistive crossbar arrays has been suggested to accelerate deep-learning workloads in highly efficient manner. To unleash the full potential of in-memory computing, it is desirable to accelerate the training as well as inference for large deep neural networks (DNNs). In the past, specialized in-memory training algorithms have been proposed that not only accelerate the forward and backward passes, but also establish tricks to update the weight in-memory and in parallel. However, the state-of-the-art algorithm (Tiki-Taka version 2 (TTv2)) still requires near perfect offset correction and suffers from potential biases that might occur due to programming and estimation inaccuracies, as well as longer-term instabilities of the device materials. Here we propose and describe two new and improved algorithms for in-memory computing (Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD)), that retain the same runtime complexity but correct for any remaining offsets using choppers. These algorithms greatly relax the device requirements and thus expanding the scope of possible materials potentially employed for such fast in-memory DNN training.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Analog Matrix-vector Multiplicationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Fast offset corrected in-memory training

Rasch¹,

Carta²,

Fagbohungbe³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…We examine the accuracy of DNNs using these PCMs with and without liners using the IBM's analog AI simulation tool. [ 17 ] We compare all devices across 3 DNNs and 4 datasets: ResNet‐32 evaluated on the CIFAR‐10 dataset, 2‐layer LSTM evaluated on the Penn Treebank dataset, BERT network evaluated on MRPC dataset, and BERT network evaluated on MNLI dataset.…”

Section: Dnn Inference Accuracy Investigationmentioning

confidence: 99%

Optimization of Projected Phase Change Memory for Analog In‐Memory Computing Inference

Mackin

Chen

et al. 2023

Adv Elect Materials

Self Cite

View full text Add to dashboard Cite

particularly for inference of previouslytrained Deep Neural Networks, [1,2] as well as Neuromorphic computing. [3,4] Many factors including resistance values, memory window, resistance drift, read noise, and programming accuracy impact the performance of PCM in analog in-memory computing applications. We previously showed that introduction of an additional projection liner, [5][6][7] which is comprised of a non-phase change material, helps mitigate non-ideal attributes of PCM devices such as drift and noise. Here, we perform a systematic study of these electrical properties and discuss their implications for in-memory inference computing. We show that these properties are tunable through the change of projection liner, which enables the optimization of the device characteristics to improve the network accuracy of chips using these devices for in-memory computing.As many of the device performance metrics, for example, resistance drift, memory window, read noise, can be modulated by the liner, it is important to understand how to optimize these metrics to produce the best results for various deep neural networks (DNNs). We developed models to represent the drift and noise behavior of the PCMs, and use them to evaluate the performance of these PCM devices in neural network inference applications. We evaluate large neural networks with tens of millions of weights using the PCM with and without liner, and evaluate a variety of DNNs and test datasets at multiple time-steps after programming. We find that the liner devices perform well across different DNN types, including recurrent neural networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer-based networks. For RNNs, we evaluate a two-layer Long Short-Term Memory (LSTM) network on the Penn Treebank dataset. [8] For CNNs, we examine a ResNet-32 network using the CIFAR-10 dataset. [9] And for Transformer-based networks, we evaluate BERT-base on the MRPC dataset [10] and MNLI dataset. [11] We also evaluate various weight mapping schemes, including using a direct weight mapping scheme for one or two PCM per weight and an optimized weight mapping scheme using four PCMs per weight. [12] We show that PCM with liner can improve network accuracy for all these weight mapping schemes as well as for networks with Phase change memory (PCM) is one of the most promising candidates for non-von Neumann based analog in-memory computing-particularly for inference of previously-trained deep neural networks (DNN).It is shown that PCM electrical properties can be tuned systematically using a projection liner, which is designed for resistance drift mitigation, in the manufacturable mushroom PCM. A systematic study of the electrical properties-including resistance values, memory window, resistance drift, read noise, and their impact on the accuracy of large neural networks of various types and with tens of millions of weights is performed. It is sown that the DNN accuracy can be improved by the PCM with liner for both the short term and long term after programming, due to red...

show abstract

“…Figure 2 displays the classification error in MNIST handwritten digit recognition problem as a function of training epoch for various scenarios of training algorithm and AF values.We perform the experimental simulation using the opensource toolkit aihwkit(Rasch et al, 2021) and further details about the experimental environment can be found in Supplementary Materials section 3.3. First, performing SGD with asymmetric devices results in severe accuracy drop both for the train and test datasets when compared with the software baseline.…”

mentioning

confidence: 99%

Impact of Asymmetric Weight Update on Neural Network Training With Tiki-Taka Algorithm

Lee

Noh

et al. 2022

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

Recent progress in novel non-volatile memory-based synaptic device technologies and their feasibility for matrix-vector multiplication (MVM) has ignited active research on implementing analog neural network training accelerators with resistive crosspoint arrays. While significant performance boost as well as area- and power-efficiency is theoretically predicted, the realization of such analog accelerators is largely limited by non-ideal switching characteristics of crosspoint elements. One of the most performance-limiting non-idealities is the conductance update asymmetry which is known to distort the actual weight change values away from the calculation by error back-propagation and, therefore, significantly deteriorates the neural network training performance. To address this issue by an algorithmic remedy, Tiki-Taka algorithm was proposed and shown to be effective for neural network training with asymmetric devices. However, a systematic analysis to reveal the required asymmetry specification to guarantee the neural network performance has been unexplored. Here, we quantitatively analyze the impact of update asymmetry on the neural network training performance when trained with Tiki-Taka algorithm by exploring the space of asymmetry and hyper-parameters and measuring the classification accuracy. We discover that the update asymmetry level of the auxiliary array affects the way the optimizer takes the importance of previous gradients, whereas that of main array affects the frequency of accepting those gradients. We propose a novel calibration method to find the optimal operating point in terms of device and network parameters. By searching over the hyper-parameter space of Tiki-Taka algorithm using interpolation and Gaussian filtering, we find the optimal hyper-parameters efficiently and reveal the optimal range of asymmetry, namely the asymmetry specification. Finally, we show that the analysis and calibration method be applicable to spiking neural networks.

show abstract

A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays

Cited by 76 publications

References 12 publications

Fast offset corrected in-memory training

Fast offset corrected in-memory training

Optimization of Projected Phase Change Memory for Analog In‐Memory Computing Inference

Impact of Asymmetric Weight Update on Neural Network Training With Tiki-Taka Algorithm

Contact Info

Product

Resources

About