This work presents an energy-efficient SRAM with embedded dot-product computation capability, for binary-weight convolutional neural networks. A 10T bit-cell based SRAM array is used to store the 1-b filter weights. The array implements dotproduct as a weighted average of the bit-line voltages, which are proportional to the digital input values. Local integrating ADCs compute the digital convolution outputs, corresponding to each filter. We have successfully demonstrated functionality (> 98% accuracy) with the 10,000 test images in the MNIST handwritten digit recognition dataset, using 6-b inputs/outputs. Compared to conventional full-digital implementations using small bit-widths, we achieve similar or better energy-efficiency, by reducing data transfer, due to the highly parallel in-memory analog computations.
Convolutional Neural Networks (CNN) provide state-of-the-art results in a wide variety of machine learning (ML) applications, ranging from image classification to speech recognition. However, they are highly computation-intensive and require huge amounts of storage. Recent works have strived towards reducing the size of the CNNs, eg. [1] proposed a binary-weight-network (BWN), where the filter weights (wi's) are +1/-1 (with a common scaling factor per filter: 'α'). This leads to a significant reduction in the amount of storage required for the wi's, making it possible to store them entirely on-chip. However, in a conventional all-digital implementation [2-3], reading the wi's from the on-chip SRAMs leads to a lot of data movement per computation and hence, makes them energy-hungry. To reduce data-movement, and subsequently energy, we propose an SRAM-embedded convolution computation architecture (Fig. 1), which does not require reading the wi's explicitly from the memory. To the best of our knowledge, this is the first work to demonstrate on silicon SRAM-based convolutions for CNNs. Prior works on embedded ML classifiers have focused on 1b outputs [4] or small number of output classes [5], both of which are not sufficient for CNNs. In this work we use 7b for inputs/outputs, which was found to be sufficient to maintain a good accuracy for most of the popular CNNs [1]. The convolution operation is implemented as voltage-averaging (Fig. 1), due to the wi's being binary
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.