2017
DOI: 10.1109/tcsvt.2016.2592330
|View full text |Cite
|
Sign up to set email alerts
|

Origami: A 803-GOp/s/W Convolutional Network Accelerator

Abstract: Abstract-An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile computer vision systems. We present a new architecture, design and implementation as well as the first re… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
164
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 178 publications
(164 citation statements)
references
References 42 publications
0
164
0
Order By: Relevance
“…Numerous previous efforts [15][16][17][18][19][20][21][22][23][24][25][26] have proposed solutions for CNN acceleration, but it is difficult to compare their performance directly due to differences in implementation and design choices. In this section, we present a taxonomy of these existing CNN dataflows based on their data handling characteristics.…”
Section: Existing Cnn Dataflowsmentioning
confidence: 99%
See 2 more Smart Citations
“…Numerous previous efforts [15][16][17][18][19][20][21][22][23][24][25][26] have proposed solutions for CNN acceleration, but it is difficult to compare their performance directly due to differences in implementation and design choices. In this section, we present a taxonomy of these existing CNN dataflows based on their data handling characteristics.…”
Section: Existing Cnn Dataflowsmentioning
confidence: 99%
“…Many previous papers have proposed specialized CNN dataflows on various platforms, including GPU [14], FPGA [15][16][17][18][19][20][21], and ASIC [22][23][24][25][26]. However, due to differences in technology, hardware resources and system setup, a direct comparison between different implementations does not provide much insight into the relative energy efficiency of different dataflows.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The CPU is responsible for only receiving and sending the packets. Cavigelli et al (2015) presented a convolutional network accelerator that is scalable to network sizes that are currently handled by only workstation GPUs, but remains within the power envelope of embedded systems. It can significantly improve the external memory bottleneck of previous architectures, is more area efficient than previously reported results, and comes with the lowest-ever reported power consumption when including I/O power and external memory.…”
Section: Hardware Accelerationmentioning
confidence: 99%
“…It can process a matrix multiplication at very high speed. In addition to GPUs, FPGAs [11,12,13] and specific LSIs [14,15,16] have been proposed. By utilizing a specialized hardware structure for CNN, it can achieve higher throughput and operation performance, compared to GPU based approaches.…”
Section: Introductionmentioning
confidence: 99%