Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

Yin, Shihui; Jiang, Zhewei; Kim, Minkyu; Gupta, Tushar; Seok, Mingoo; Seo, Jae-sun

doi:10.1109/tvlsi.2019.2940649

Cited by 40 publications

(14 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…NDPproposals have explored many memory architectures. In the literature, SRAM-based NDP proposals mostly aim to insert logic capabilities to the host's cache memories or to the host's memory controllers [17,19,20,21,22,23,24,58,59,103,106]. This work modify the cache hierarchy trying to avoid moving data from the main memory and cache memories to the host's core.…”

Section: B Memory Architectures and Ndpmentioning

confidence: 99%

“…Wang et al [20] extend Neural Cache by adding techniques to leverage sparsity-awareness, NN redundancy, and add new efficient compute algorithms for binary and ternary neural networks. Yin et al [21] part from the same base idea as Neural Cache, but enable more scalability by using XNOR-Accumulate operations to enable activation of multiple SRAM rows, double buffers to hide in-memory reprogramming latencies, and additional peripheral logic for multi-bit activation. Ramanathan et al [22] propose the BFree, a bit-line free LUTbased NDP in SRAM subarrays that allows reconfigurable precision and NN layout.…”

Section: Introductionmentioning

confidence: 99%

“…We could observe that most NCA proposals focus on x x x x [24] x x x x [34] x x x x [40] x x x x [42] x x x x [19] x x x x [36] x x x x [35] x x x x [43] x x x x [45] x x x x [20] x x x x [21] x x x x [38] x x x x [46] x x x x [22] x…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Survey on Near-Data Processing: Applications and Architectures

Santos

Moreira

Cordeiro

et al. 2021

JICS

View full text Add to dashboard Cite

One of the main challenges for modern processors is the data transfer between processor and memory. Such data movement implies high latency and high energy consumption. In this context, Near-Data Processing (NDP) proposals have started to gain acceptance as an accelerator device. Such proposals alleviate the memory bottleneck by moving instructions to data whereabouts. The first proposals date back to the 1990s, but it was only in the 2010s that we could observe an increase in papers addressing NDP. It occurred together with the appearance of 3D-stacked chips with logic and memory stacked layers. This survey presents a brief history of these accelerators, focusing on the applications domains migrated to near-data and the proposed architectures. We also introduce a new taxonomy to classify such architectural proposals according to their data distance.

show abstract

Section: B Memory Architectures and Ndpmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Survey on Near-Data Processing: Applications and Architectures

Santos

Moreira

Cordeiro

et al. 2021

JICS

View full text Add to dashboard Cite

show abstract

“…A potential manner in which AI can be implemented in IoT devices is via a binarized CNN (BCNN) [3], where network parameters are expressed in a binary format with an inference accuracy comparable to that of the original CNN. Several researches have been conducted on implementing BCNN accelerators using various hardware platforms such as GPUs [4], ASICs [5][6][7][8][9][10][11][12][13][14][15][16], and FPGAs [17][18][19][20][21][22][23][24].…”

Section: Introductionmentioning

confidence: 99%

Design framework for an energy-efficient binary convolutional neural network accelerator based on nonvolatile logic

Suzuki

Oka

Tamakoshi

et al. 2021

NOLTA

View full text Add to dashboard Cite

Convolutional neural network (CNN) accelerators, particularly binarized CNN (BCNN) accelerators have proven to be effective for several artificial-intelligence-oriented several applications; however, their energy efficiency should be further improved for edge applications. In this paper, a design framework for an energy-efficient BCNN accelerator based on nonvolatile logic is presented. Designing BCNN accelerators using nonvolatile logic allows for the accelerators to exhibit a massively parallel and ultra-low standby power capability. Thus, a new design can be realized for accelerators that is different from that of conventional accelerators based solely on CMOS. Considering this, we discuss a concrete design considerations of nonvolatile BCNN accelerators. In fact, a systematic design flow of the nonvolatile BCNN is established by combining Vivado HLS and standard electronic design automation tools. As a typical design example, a BCNN accelerator for inferring 32 × 32 pixel MNIST dataset is designed using a 65-nm CMOS technology. By the logic-synthesis result, the proposed BCNN accelerator is estimated to consume 94.2% lower power than that of a conventional BCNN accelerator when the frame rate is 30 frames per second.

show abstract

“…In M3D integration, there was a concern about the thermal problem induced by high power density and low heat dissipation capability. Thus, we evaluated the peak temperature of each BNN_Accel with HotSpot 6.0 [15]. For a conservative thermal evaluation, each BNN_Accel was placed beside the big CPU core cluster consuming 4.0 W, which was the thermal design power (TDP) of the cluster [16].…”

mentioning

confidence: 99%

A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM

2021

View full text Add to dashboard Cite

Binary neural networks (BNNs) are adequate for energy-constrained embedded systems thanks to binarized parameters. Several researchers have proposed the compute-in-memory (CiM) SRAMs for XNOR-and-accumulation computations (XACs) in BNNs by adding additional transistors to the conventional 6T SRAM, which reduce the latency and energy of the data movements. However, due to the additional transistors, the CiM SRAMs suffer from larger area and longer wires than the conventional 6T SRAMs. Meanwhile, monolithic 3D (M3D) integration enables fine-grained 3D integration, reducing the 2D wire length in small functional units. In this paper, we propose a BNN accelerator (BNN_Accel), composed of a 9T CiM SRAM (CiM_SRAM), input buffer, and global periphery logic, to execute the computations in the binarized convolution layers of BNNs. We also propose CiM_SRAM with the subarray-level M3D integration (as well as the transistor-level M3D integration), which reduces the wire latency and energy compared to the 2D planar CiM_SRAM. Across the binarized convolution layers, our simulation results show that BNN_Accel with the 4-layer CiM_SRAM reduces the average execution time and energy by 39.9% and 23.2%, respectively, compared to BNN_Accel with the 2D planar CiM_SRAM.

show abstract

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

Cited by 40 publications

References 30 publications

Survey on Near-Data Processing: Applications and Architectures

Survey on Near-Data Processing: Applications and Architectures

Design framework for an energy-efficient binary convolutional neural network accelerator based on nonvolatile logic

A System-Level Exploration of Binary Neural Network Accelerators with Monolithic 3D Based Compute-in-Memory SRAM

Contact Info

Product

Resources

About