Von Neumann architecture based computers isolate/physically separate computation and storage units i.e. data is shuttled between computation unit (processor) and memory unit to realize logic/ arithmetic and storage functions. This to-and-fro movement of data leads to a fundamental limitation of modern computers, known as the memory wall. Logic in-Memory (LIM) approaches aim to address this bottleneck by computing inside the memory units and thereby eliminating the energy-intensive and time-consuming data movement. However, most LIM approaches reported in literature are not truly "simultaneous" as during LIM operation the bitcell can be used only as a Memory cell or only as a Logic cell. The bitcell is not capable of storing both the Memory/Logic outputs simultaneously. Here, we propose a novel 'Simultaneous Logic in-Memory' (SLIM) methodology that allows to implement both Memory and Logic operations simultaneously on the same bitcell in a nondestructive manner without losing the previously stored Memory state. Through extensive experiments we demonstrate the SLIM methodology using non-filamentary bilayer analog OxRAM devices with NMOS transistors (2T-1R bitcell). Detailed programming scheme, array level implementation and controller architecture are also proposed. Furthermore, to study the impact of introducing SLIM array in the memory hierarchy, a simple image processing application (edge detection) is also investigated. It has been estimated that by performing all computations inside the SLIM array, the total Energy Delay Product (EDP) reduces by ~ 40x in comparison to a modern-day computer. EDP saving owing to reduction in data transfer between CPU Memory is observed to be ~ 780x.Over the past few decades, the performance gap between the computing unit (where the data is processed) and the memory unit (where the data is stored) is increasing, popularly known as the memory wall 1 . It is observed that for many computing tasks, most of the time and energy is consumed in data transfer between the processing unit and memory unit, rather than the computation 2 . To tackle this bottleneck, various solutions have been proposed, targeting from component level to the system architecture level. Measures include the extensive use of spatial architectures (distributed on-chip memory that is closer to the computation unit) enabling parallelism using vector processing unit, with large number of cores 3 . Furthermore, accelerators have been designed to match the exact data flow for specific computing algorithms 4 . Three-dimensional memories, commercialized as hybrid memory cube 5 and high bandwidth memory 6 chips have been proposed to meet the requirements of high data-transfer rate and high memory density. These deliver an order of magnitude higher bandwidth and reduce access energy by up to 5x relative to existing 2-dimensional DRAMs 7 . Moving further, emerging non-volatile memories (NVM) have been introduced into the traditional memory hierarchy to minimize the 'gap' between computing and the data units 8 . However, t...
In this paper, we present an enhanced version of our neuromorphic hardware simulation framework, MASTISK (MAchine-Learning and Synaptic-plasticity Technology Integration Simulation frameworK). We integrate the feature of short-term plasticity (STP) into the simulator to bring it closer to biological functionality. We introduce a novel cross-platform methodology for the implementation of STP synapses with user-tunable parameters using two-terminal emerging non-volatile memory devices. To study the impact of the proposed STP synapse circuit, a case study based on a non-filamentary bi-layer oxide-based resistive memory device (a Ta/HfO 2 /Al-doped TiO 2 /TiN device stack) is presented. The key performance parameters extracted from MASTISK are: mean square error (in terms of neuron firing rate), total STP synaptic device switching energy and worst-case device switching activity. We compare the performance of a pure long-term plasticity (LTP)-based network against a network with LTP + STP. The results indicate that there is a marginal loss in learning accuracy but greater stability in synaptic weight updates and enhanced noise tolerance. We also analyze the impact of user-tunable parameters for the proposed STP synapse circuit. The user-tunable parameters used for the analysis are: (1) STP history buffer size and (2) STP update threshold. Our analysis of the learning performance indicates a similar trend to that for the hyper-parameters used for regularization in artificial neural networks or support vector machines. Our analysis of device switching energy and switching activity gives us an idea of how to achieve an optimal trade-off in terms of endurance and power with regard to learning accuracy. We also study the impact of device non-linearity by simulating multiple devices with different asymmetric non-linearity values.
Conventional DNN (deep neural network) implementations rely on networks with sizes in the order of MBs (megabytes) and computation complexity of the order of Tera FLOPs (floating point operations per second). However, implementing such networks in the context of edge-AI (artificial intelligence) poses limitations due to the requirement of high precision computation blocks, large memory requirement, and memory wall. To address this, low-precision DNN implementations based on IMC (in-memory computing) approaches utilizing NVM (non-volatile memory) devices have been explored recently. In this work, we experimentally demonstrate a dual-configuration XNOR (exclusive NOR) IMC bitcell. The bitcell is realized using fabricated 1T-1R SiOx RRAM (resistive random access memory) arrays. We have analyzed the trade-off in terms of circuit-overhead, energy, and latency for both IMC bitcell configurations. Furthermore, we demonstrate the functionality of the proposed IMC bitcells with mobilenet architecture based BNNs (binarized neural networks). The network is trained on VWW (visual wake words) and CIFAR-10 datasets, leading to an inference accuracy of ≈80.3% and ≈84.9%, respectively. Additionally, the impact of simulated BER (bit error rate) on the BNN accuracy is also analyzed.
Recent advances in artificial intelligence (AI) have led to successful solutions for numerous applications by utilizing deep neural network (DNN) architectures. [1] Hence, specialized hardware accelerators have been developed to facilitate highspeed computations for these data-intensive workloads. [2] While these computational engines have led to several advanced applications at the cloud-scale, true benefits of AI can be realized by enabling low-power edge computing. For Internet of Things (IoT) devices with constrained area and power, performing highprecision computations becomes infeasible. Quantization of neural network (NN) weights and activations has been explored as a means to reduce the energy cost of computations while preserving computational accuracy. [3] However, memory capacity and bandwidth can be observed as primary limiting factors
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.