2015
DOI: 10.1145/2735841
|View full text |Cite
|
Sign up to set email alerts
|

Convolution engine

Abstract: General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused acro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 22 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Note that this evaluation targets a specific NN accelerator microarchitecture (i.e., TPU). However, in order to provide a generic evaluation, we focus on investigating the impact of NC-FinFET on the MADD unit that is a core component of any NN accelerator independently of its microarchitecture [4], [5], [17], [19], [30]- [33].…”
Section: A Neural Processing Unit Use Casementioning
confidence: 99%
See 1 more Smart Citation
“…Note that this evaluation targets a specific NN accelerator microarchitecture (i.e., TPU). However, in order to provide a generic evaluation, we focus on investigating the impact of NC-FinFET on the MADD unit that is a core component of any NN accelerator independently of its microarchitecture [4], [5], [17], [19], [30]- [33].…”
Section: A Neural Processing Unit Use Casementioning
confidence: 99%
“…Therefore, NC-FinFET not only improves the energy/speed of the NN inference accelerators but also enables NN developers to rethink their implementations and exploit the higher precision that NC-FinFET delivers to improve the accuracy of their models without trading off for speed and energy. For example, existing NN architectures trade throughput (e.g., [5] combines many MADD units to enable higher computational precision) or speed (e.g., [30] uses 10bit MADD units that are 1.15x slower than the 8-bit ones) to achieve higher inference accuracy. Similarly, [19], [32], [33] apply approximations and trade accuracy to improve the speed and/or energy consumption.…”
Section: Neural Network Inference Evaluationmentioning
confidence: 99%
“…Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuits (ASIC), and Graphical Processing Units (GPU) are widely used to accelerate DNNs. The specialized hardware-based DNN accelerators can be categorized into two classes: the first class of accelerators efficiently implements the computational primitives, such as convolutional operations, fully connected operations, etc., for the DNNs [85], [175] and the second class of DNN accelerators efficiently optimize the data movement and memory access [56], [177]. These two generations of specialized hardware-based DNN accelerators improve the speed and energy efficiency of running DNNs.…”
Section: Introductionmentioning
confidence: 99%