Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors

Xue, Cheng-Xin; Chang, Ting-Wei; Chang, Tung-Cheng; Kao, Hui-Yao; Chiu, Yen-Cheng; Lee, Chun-Ying; King, Ya‐Chin; Lin, Chao-An; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Chen, Wei-Hao; Chang, Meng‐Fan; Liu, Je-Syu; Li, Jiafang; Lin, Wei‐Yu; Lin, Wan-Wan; Wang, Jinghong; Wei, Weichen; Huang, Tai Yin

doi:10.1109/jssc.2019.2951363

Cited by 74 publications

(30 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The peripheral circuit design discussed by Xue et al from the same team further optimized their strategy. [ 95 ] The circuit design of a weighted current translator was proposed by using different sizes of transistors for multibit dot product in MAC. A positive–negative current‐subtractor circuit design was applied to reduce the total output current.…”

Section: Memristive Convolutional Acceleratormentioning

confidence: 99%

See 1 more Smart Citation

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Qin

Bao

Wang

et al. 2020

Advanced Intelligent Systems

View full text Add to dashboard Cite

Recently, deep learning has shown substantial breakthroughs in various fields such as speech recognition, image and video classification, and natural language processing. [1-3] The explosive development of deep learning has promoted the convergence of this field with other disciplines. The progress has benefited from the update and improvement of models and theories in computer science, as well as the advancement of contemporary semiconductor chip technology. However, the limited bandwidth and computing resources of the traditional computer system greatly restrict the running speed when faced with the increasing scale of deep neural networks (DNNs). The traditional von Neumann architecture separates data storage and computing. Frequent and inefficient movement of data between the processor and memory or off-chip storage brings latency and energy consumption issues, while the mismatch between data transmission and data processing becomes a bottleneck in the implementation of deep learning in hardware. Due to the high-bandwidth and highparallelism requirements of deep learning, data-intensive artificial intelligence (AI) applications have been dominated by cloud computing; that is, edge devices act as data-collecting interfaces and pass data to clustered cloud computer centers for computing to achieve deep learning. [4] Such AI applications place high requirements on network bandwidth and latency, and take the privacy leakage issue to users. [5] For example, in areas with poor network conditions, Tesla's AI autonomous driving will become unreliable and even life-threatening. With the popularization of deep learning, the efficient AI applications that can be seen daily are becoming an urgent need. Edge intelligence is a concept relative to cloud intelligence. [6] Edge computing requires real-time intelligence on devices with strict budgets for energy consumption and device area, such as smart watches and drones. It pushes cloud services from the network core to the edge of the network that is closer to Internet-of-things (IoT) devices and data sources, and then builds up an end-to-end network. Physical proximity to the informationgeneration sources is the most crucial characteristic emphasized by edge computing, wherefore high energy efficiency, small size, low latency, and high privacy protection become valued characteristics for edge intelligence. [7-9] With the combination of hardware and AI, devices dedicated for deep learning have emerged. These devices are called neural network accelerators. The combination of a traditional complementary metal-oxide semiconductor (CMOS) and emerging nonvolatile memory provides a considerable wealth of possibilities for AI accelerators. [10-13] The use of memory technology as a synaptic weight matrix storage unit has set a foundation for the hardware implementation of neuromorphic computing systems. In some prominent AI chips, traditional memories have been utilized; for example,

show abstract

Section: Memristive Convolutional Acceleratormentioning

confidence: 99%

“…It is worth noting that many of the memristive LSTM simulation efforts, including this study, envision array sizes far beyond the existing real memristor arrays, so how to reduce the array size requirements of LSTM networks to apply them to edge intelligence devices is an urgent challenge. [ 36,126,127,130,133,134 ]…”

Section: Memristive Lstm Neural Networkmentioning

confidence: 99%

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Qin

Bao

Wang

et al. 2020

Advanced Intelligent Systems

View full text Add to dashboard Cite

show abstract

“…The number of bitlines processed in parallel is closely related to the number of ADCs. Technically, RRAM array computation itself happens in a very short time (often less than ve nanoseconds [32,35]). However, a single ADC can only read a single value every ADC cycle.…”

Section: Rram Accelerator Performance Scalingmentioning

confidence: 99%

“…The non-idealities of RRAM cells explained above are signi cant problems, and they in fact act as a limiting factor when scaling the MAW. For example, practical RRAM-based DNN accelerators [2,27,[31][32][33] does not concurrently process all wordlines of RRAM crossbar. For example, recent RRAM macros can concurrently activate only 9 out of 256 wordlines [2,31] or 16 out of 512 wordlines [33].…”

Section: Challenges In Exploiting Wordline-level Parallelismmentioning

confidence: 99%

“…Figure 6(a) illustrates the pro ling process for the rst layer of a 4-layer neural network model. For this purpose, we change the MAW for the rst layer (e.g., 8,16,32) while the other layers run in a pure software mode with no error, and then measure the deviation (labeled "Dev" in Figure 6(a)) from the ideal end-to-end accuracy (i.e., the accuracy for the case where this model runs without any error). Then, the marginal cost for each MAW is calculated by comparing the deviation with the immediately previous level.…”

Section: Layer-wise Maw Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Unlocking wordline-level parallelism for fast inference on RRAM-based DNN accelerator

Park

Lee

Shin

et al. 2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

Recent Advances in Synaptic Nonvolatile Memory Devices and Compensating Architectural and Algorithmic Methods Toward Fully Integrated Neuromorphic Chips

Byun

Choi

Kwon

et al. 2022

Adv Materials Technologies

View full text Add to dashboard Cite

Nonvolatile memory (NVM)‐based neuromorphic computing has been attracting considerable attention from academia and the industry. Although it is not completely successful yet, remarkable achievements have been reported pertaining to synaptic devices that can leverage NVM capable of storing multiple states. The analog synaptic devices performing computation similar to biological nerve systems are crucial in energy‐efficient analog neuromorphic computing systems. To use NVM as an analog synaptic device, researchers focus on improving device characteristics. Among various characteristics, the most challenging one is linearity and symmetry of synaptic weight update that is required for on‐chip training. In this regard, this review paper discusses recent synaptic device improvements focusing on novel schemes tailored for each NVM device to improve the linearity and symmetry. In addition to device‐level studies, recent research achievements are reviewed expanded up to chip‐level studies because in realizing neuromorphic hardware systems beyond a single synaptic device, several considerations and requirements are needed to confirm for high‐level design, and accordingly, cooptimize among synaptic devices, synapse arrays, electrical circuits, neural networks, algorithms, and implementation. Also, this review paper introduces various circuit and algorithmic approaches to compensate for the non‐ideality of the analog synaptic device.

show abstract

Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors

Cited by 74 publications

References 24 publications

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Recent Progress on Memristive Convolutional Neural Networks for Edge Intelligence

Unlocking wordline-level parallelism for fast inference on RRAM-based DNN accelerator

Recent Advances in Synaptic Nonvolatile Memory Devices and Compensating Architectural and Algorithmic Methods Toward Fully Integrated Neuromorphic Chips

Contact Info

Product

Resources

About