Neural-network training can be slow and energy intensive, owing to the need to transfer the weight data for the network between conventional digital memory chips and processor chips. Analogue non-volatile memory can accelerate the neural-network training algorithm known as backpropagation by performing parallelized multiply-accumulate operations in the analogue domain at the location of the weight data. However, the classification accuracies of such in situ training using non-volatile-memory hardware have generally been less than those of software-based training, owing to insufficient dynamic range and excessive weight-update asymmetry. Here we demonstrate mixed hardware-software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with 'polarity inversion' to cancel out inherent device-to-device variations. We achieve generalization accuracies (on previously unseen data) equivalent to those of software-based training on various commonly used machine-learning test datasets (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100). The computational energy efficiency of 28,065 billion operations per second per watt and throughput per area of 3.6 trillion operations per second per square millimetre that we calculate for our implementation exceed those of today's graphical processing units by two orders of magnitude. This work provides a path towards hardware accelerators that are both fast and energy efficient, particularly on fully connected neural-network layers.
High-temperature data retention is a critical hurdle for the commercialization of emerging nonvolatile memories. For Conductive-Bridge RAM (CBRAM) [1], we discuss hightemperature retention in terms of the physics of quantum point contacts, and we report on a family of CBRAM cells that achieve excellent retention at temperatures exceeding 200ºC.
We report the first demonstration of 200 mm InGaAs-on-insulator (InGaAs-o-I) fabricated by the direct wafer bonding technique with a donor wafer made of III-V heteroepitaxial structure grown on 200 mm silicon wafer. The measured threading dislocation density of the In0.53Ga0.47As (InGaAs) active layer is equal to 3.5 × 109 cm−2, and it does not degrade after the bonding and the layer transfer steps. The surface roughness of the InGaAs layer can be improved by chemical-mechanical-polishing step, reaching values as low as 0.4 nm root-mean-square. The electron Hall mobility in 450 nm thick InGaAs-o-I layer reaches values of up to 6000 cm2/Vs, and working pseudo-MOS transistors are demonstrated with an extracted electron mobility in the range of 2000–3000 cm2/Vs. Finally, the fabrication of an InGaAs-o-I substrate with the active layer as thin as 90 nm is achieved with a Buried Oxide of 50 nm. These results open the way to very large scale production of III-V-o-I advanced substrates for future CMOS technology nodes.
Novel Deep Neural Network (DNN) accelerators based on crossbar arrays of non-volatile memories (NVMs)—such as Phase-Change Memory or Resistive Memory—can implement multiply-accumulate operations in a highly parallelized fashion. In such systems, computation occurs in the analog domain at the location of weight data encoded into the conductances of the NVM devices. This allows DNN training of fully-connected layers to be performed faster and with less energy. Using a mixed-hardware-software experiment, we recently showed that by encoding each weight into four distinct physical devices—a “Most Significant Conductance” pair (MSP) and a “Least Significant Conductance” pair (LSP)—we can train DNNs to software-equivalent accuracy despite the imperfections of real analog memory devices. We surmised that, by dividing the task of updating and maintaining weight values between the two conductance pairs, this approach should significantly relax the otherwise quite stringent device requirements. In this paper, we quantify these relaxed requirements for analog memory devices exhibiting a saturating conductance response, assuming either an immediate or a delayed steep initial slope in conductance change. We discuss requirements on the LSP imposed by the “Open Loop Tuning” performed after each training example and on the MSP due to the “Closed Loop Tuning” performed periodically for weight transfer between the conductance pairs. Using simulations to evaluate the final generalization accuracy of a trained four-neuron-layer fully-connected network, we quantify the required dynamic range (as controlled by the size of the steep initial jump), the tolerable device-to-device variability in both maximum conductance and maximum conductance change, the tolerable pulse-to-pulse variability in conductance change, and the tolerable device yield, for both the LSP and MSP devices. We also investigate various Closed Loop Tuning strategies and describe the impact of the MSP/LSP approach on device endurance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.