2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) 2016
DOI: 10.1109/isca.2016.40
|View full text |Cite
|
Sign up to set email alerts
|

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Abstract: Abstract-Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
900
0
2

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 658 publications
(947 citation statements)
references
References 37 publications
0
900
0
2
Order By: Relevance
“…There exists a severe contradiction between the complex model and the limited computational resources. Although at present, a large amount of dedicated hardware emerges for deep learning [16,17,18,19,20], providing efficient vector operations to enable fast convolution in forward inference, From the aspect of explainable machine learning, we can summarize that some filters are playing a similar role in the model, especially when the model size is large. So it is reasonable to prune some useless filters or reduce their precision to lower bits.…”
Section: Introductionmentioning
confidence: 99%
“…There exists a severe contradiction between the complex model and the limited computational resources. Although at present, a large amount of dedicated hardware emerges for deep learning [16,17,18,19,20], providing efficient vector operations to enable fast convolution in forward inference, From the aspect of explainable machine learning, we can summarize that some filters are playing a similar role in the model, especially when the model size is large. So it is reasonable to prune some useless filters or reduce their precision to lower bits.…”
Section: Introductionmentioning
confidence: 99%
“…Eyeriss [6,7] is a recent ASIC CNN accelerator that couples a compute grid with a NoC, enabling flexibility in scheduling CNN computation. This flexibility limits arithmetic unit underutilization.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, DNN-based applications often require not only high accuracy, but also aggressive hardware performance, including high throughput, low latency, and high energy efficiency. As such, there has been intensive research on DNN accelerators in order to take advantage of different hardware platforms, such as FPGAs and ASICs, for improving DNN acceleration efficiency [9,10,11,12,13,14].…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, Timeloop obtains the number of memory accesses and estimates the latency by calculating the maximum isolated execution cycle across all hardware IPs based on a double-buffering assumption. Accelergy [23] proposes a configuration language to describe hardware architectures and depends on plug-ins, e.g., Timeloop, to calculate the energy as in [14]. The work in [24] adopts Halide [25], a domain-specific language for image processing applications, and proposes a modeling framework which is similar to that of [14].…”
Section: Introductionmentioning
confidence: 99%