Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Chen, Yu‐Hsin; Emer, Joel; Sze, Vivienne

doi:10.1109/isca.2016.40

Cited by 658 publications

(947 citation statements)

References 37 publications

Supporting

Mentioning

900

Contrasting

Unclassified

Order By: Relevance

“…There exists a severe contradiction between the complex model and the limited computational resources. Although at present, a large amount of dedicated hardware emerges for deep learning [16,17,18,19,20], providing efficient vector operations to enable fast convolution in forward inference, From the aspect of explainable machine learning, we can summarize that some filters are playing a similar role in the model, especially when the model size is large. So it is reasonable to prune some useless filters or reduce their precision to lower bits.…”

Section: Introductionmentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

384

171

View full text Add to dashboard Cite

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.the heavy computation and storage still inevitably limit the applications of the deep CNNs in practice. Besides, due to the huge model parameter space, the prediction of the neural networks is usually viewed as a black-box, which brings great challenges to the interpretability of CNNs. Some works like [21,22,23] empirically explore the function of each layer in the network. They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g. , from the floating-point weights to the integer ones). Low-rank factorization applies the matrix/tensor decomposition techniques to estimate the informative parameters using the proxy ones of small size. The compact convolutional filter based approaches rely on the carefullydesigned structural convolutional filters to reduce the storage and computation complexity. The knowledge distillation methods try to distill a more compact model to reproduce the output of a larger network.Among the existing network compression techniques, quantization based one serves as a promising and fast solution that yields highly compact models compared to their floating-point counterparts, by representing the network weights with very low precision. Along this direction, the most extreme quantization is binarization, the interest...

show abstract

Section: Introductionmentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

384

171

View full text Add to dashboard Cite

show abstract

“…Eyeriss [6,7] is a recent ASIC CNN accelerator that couples a compute grid with a NoC, enabling flexibility in scheduling CNN computation. This flexibility limits arithmetic unit underutilization.…”

Section: Related Workmentioning

confidence: 99%

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

ShenYongming

FerdmanMichael

MilderPeter

2017

SIGARCH Comput. Archit. News

131

149

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions.We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x.

show abstract

“…Moreover, DNN-based applications often require not only high accuracy, but also aggressive hardware performance, including high throughput, low latency, and high energy efficiency. As such, there has been intensive research on DNN accelerators in order to take advantage of different hardware platforms, such as FPGAs and ASICs, for improving DNN acceleration efficiency [9,10,11,12,13,14].…”

Section: Introductionmentioning

confidence: 99%

“…Specifically, Timeloop obtains the number of memory accesses and estimates the latency by calculating the maximum isolated execution cycle across all hardware IPs based on a double-buffering assumption. Accelergy [23] proposes a configuration language to describe hardware architectures and depends on plug-ins, e.g., Timeloop, to calculate the energy as in [14]. The work in [24] adopts Halide [25], a domain-specific language for image processing applications, and proposes a modeling framework which is similar to that of [14].…”

Section: Introductionmentioning

confidence: 99%

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

Zhao

Wang

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The recent breakthroughs in deep neural networks (DNNs) have spurred a tremendously increased demand for DNN accelerators. However, designing DNN accelerators is non-trivial as it often takes months/years and requires cross-disciplinary knowledge. To enable fast and effective DNN accelerator development, we propose DNN-Chip Predictor, an analytical performance predictor which can accurately predict DNN accelerators' energy, throughput, and latency prior to their actual implementation. Our Predictor features two highlights: (1) its analytical performance formulation of DNN ASIC/FPGA accelerators facilitates fast design space exploration and optimization; and (2) it supports DNN accelerators with different algorithm-to-hardware mapping methods (i.e., dataflows) and hardware architectures. Experiment results based on 2 DNN models and 3 different ASIC/FPGA implementations show that our DNN-Chip Predictor's predicted performance differs from those of chip measurements of FPGA/ASIC implementation by no more than 17.66% when using different DNN models, hardware architectures, and dataflows. We will release code upon acceptance.

show abstract

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Cited by 658 publications

References 37 publications

Binary neural networks: A survey

Binary neural networks: A survey

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

Contact Info

Product

Resources

About