2013
DOI: 10.1145/2508148.2485925
|View full text |Cite
|
Sign up to set email alerts
|

Convolution engine

Abstract: This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications. We present an example, the Convolution E… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 30 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…P Other corresponds to the remaining elements present in the SW video decoding platform. Finally, we compare the SW video decoding energy to that of the HW video decoding by calculating the ratio between them, using Equation (11).…”
Section: B Performance and Power Consumption Characterization Methodologymentioning
confidence: 99%
See 3 more Smart Citations
“…P Other corresponds to the remaining elements present in the SW video decoding platform. Finally, we compare the SW video decoding energy to that of the HW video decoding by calculating the ratio between them, using Equation (11).…”
Section: B Performance and Power Consumption Characterization Methodologymentioning
confidence: 99%
“…We then assess the impact ofP Other , which corresponds to the power consumed related to the remaining elements present in the platform, such as memory, on the global platform power consumption using Equation (8). Finally, we compare the SW video decoding energy consumption to that of the HW video decoding by calculating the ratio between them,r sw/hw , using Equation (11). For that, Kimono video test sequence, which possesses the characteristics given in Table II (resolution is 1080p), is decoded as an example.…”
Section: End Of Frame Decodingmentioning
confidence: 99%
See 2 more Smart Citations
“…Accelerators such as CNP [45], Convolution Engine [46], Neuflow [47] and TPU [5] have introduced customized logic by leveraging high parallelism and flexibility to efficiently map convolution operations to hardware. Google also introduced TPU Edge [48] processor for inference applications at the edge.…”
Section: Existing Dnn Acceleratorsmentioning
confidence: 99%