2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC) 2019
DOI: 10.1109/vlsi-soc.2019.8920343
|View full text |Cite
|
Sign up to set email alerts
|

A Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories

Abstract: The need for running complex Machine Learning (ML) algorithms, such as Convolutional Neural Networks (CNNs), in edge devices, which are highly constrained in terms of computing power and energy, makes it important to execute such applications efficiently. The situation has led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the weights (and possibly the data being operated) using only one bit. Because approximately 90% of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…Most closely related to our approach is the paper of Vieira et al, which details a full-system evaluation strategy of AIMC acceleration. As in our case, the authors also base their approach on AIMC-dedicated extensions to the gem5 environment [23]. Nonetheless, their approach is limited to modelling the simple case of binary CNNs, and their perkernel mapping strategy does not scale to the larger and more general applications we tackle in this paper.…”
Section: B Simulations Of Aimc-based Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Most closely related to our approach is the paper of Vieira et al, which details a full-system evaluation strategy of AIMC acceleration. As in our case, the authors also base their approach on AIMC-dedicated extensions to the gem5 environment [23]. Nonetheless, their approach is limited to modelling the simple case of binary CNNs, and their perkernel mapping strategy does not scale to the larger and more general applications we tackle in this paper.…”
Section: B Simulations Of Aimc-based Systemsmentioning
confidence: 99%
“…6. By itself, this allows for the system-level implementation of classic loosely-coupled AIMC-enabled systems.To simulate the tightly-coupled AIMC-enabled architectures, we extend the accelerator modeling in [23] such that the custom ARMv8 ISA extension can also interface peripheral I/O (PIO) devices like our wrapper object. For this, we add connections between the ISA extension and PIO device via the system object (e.g., the simulated system that is instantiated on gem5-X's launch).…”
Section: B Aimc-enabled Systems In Gem5-xmentioning
confidence: 99%
“…Xilinx Vivado reports that the floatingpoint arithmetic units responsible for computing the distances account for 12% of the in-chip energy consumption. Furthermore, the memory accesses dominate the energy consumption of the entire system, which can be as high as 90% of the total energy consumption [24]. Consequently, the units responsible for computing the distance account for only 1% of the total energy consumption.…”
Section: Energy Efficiency Improvementsmentioning
confidence: 99%
“…Moreover, adding logic gates to the SA can realize more complex functions like addition. ReRAM IMCE [117] SOT-MRAM CA-PIM [8] SOT-MRAM BNN XNOR + popcnt PSA-BNN [118] SRAM SRAM-CIM [65] XNORAM [119] XNOR-SRAM [69] CIM-SR [120] SRAM and ReRAM XNOR-BNN [113] ReRAM-BNN [64] ReRAM FPSA-BNN [121] BDPE [122] 2T2R-TCAM [66] VR-XNOR [123] Memristor EEIM-BNN [124] SOT-MRAM MLC-CIM [125] STT-MRAM PIMBALL [126] TNN Ternary Multiplication or Gated-XNOR + popcnt TiM-DNN [68] SRAM-TPC XNOR-SRAM [69] SRAM TeC-Cell [127] FeRAM 4T2R-IM-DP [128] ReRAM SpinLiM [129] SOT-MRAM Ter-LiM [130] Memristor IMC-CD-TNN [70] Switch-Capacitor BWN Dense Addition ParaPIM [4] SOT-MRAM MRIMA [5] STT-MRAM TWN Sparse Addition Proposed FAT [9] STT-MRAM AdderNet Dense Add+Sub Proposed iMAD [131] STT-MRAM IMC accelerators for BNNs receive great research efforts thanks to BNNs' simple computation workflow. BNNs replace the 1-bit multiplication with XNOR and the 1-bit accumulation with popcnt (count the number of "1"s in a binary value) in the binary dot product.…”
Section: In-memory Computingmentioning
confidence: 99%
“…FPSA-BNN [121] uses (+1, -1) for weights and (+1, 0) for input neurons so that they can fuse the XNOR, popcnt, and sign function to create a Fully Parallel RRAM Synaptic Array (FPSA), achieving high parallelism by reading out several consecutive rows simultaneously. BDPE [122] integrates Binary Dot Product Engine (BDPE) inside CPU for fast and energy-efficient XNOR and popcnt operations utilizing ReRAM. 2T2R-TCAM [66] creates a 2-transistor-2-ReRAM (2T2R) Ternary Content Addressable Memory (TCAM) that supports in-memory logic and XNOR/XOR-based binary dot product.…”
Section: Inmentioning
confidence: 99%