2019
DOI: 10.48550/arxiv.1905.10830
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Feature Map Transform Coding for Energy-Efficient CNN Inference

Brian Chmiel,
Chaim Baskin,
Ron Banner
et al.

Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their relatively high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 30 publications
(35 reference statements)
0
5
0
Order By: Relevance
“…Reducing the PE count lowers the compute bound on the roofline, but, at the same time, the use of SRAM increases operation density (i.e., moves the green dots in Figure 13 to the right), possibly within hardware capabilities. Alternative solutions for the memory-bound problem include changing the CNN architecture (for example, using smaller amount of wide layers [46]), or adding a data compression scheme on the way to and from the memory [40,41,47].…”
Section: System-level Design Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…Reducing the PE count lowers the compute bound on the roofline, but, at the same time, the use of SRAM increases operation density (i.e., moves the green dots in Figure 13 to the right), possibly within hardware capabilities. Alternative solutions for the memory-bound problem include changing the CNN architecture (for example, using smaller amount of wide layers [46]), or adding a data compression scheme on the way to and from the memory [40,41,47].…”
Section: System-level Design Methodologymentioning
confidence: 99%
“…Nonetheless, this balance can be achieved in different ways: at the micro-architecture level, at the algorithmic level, or by changing the data representation. The architect may also consider: (1) changing the hardware to provide faster communication (which requires more power and is more expensive), (2) applying communication bandwidth compression algorithms [40,41], (3) using fewer bits to represent weights and activations (using 3-or 4-bit representation may solve the communication problem, at the cost of reducing the expected accuracy), or (4) changing the algorithm to transfer the data slower (even though that solves the bandwidth issue, the possible drawback is reduced throughput of the whole system). The proposed OPS-based roofline model helps the architect to choose between alternatives.…”
Section: Roofline Analysis Examplesmentioning
confidence: 99%
“…Another way to reduce memory bandwidth is by compressing the intermediate activations prior to their transfer to memory with some computationally cheap encoding, such that Huffman (Chandra, 2018;Chmiel et al, 2019) or run-length (RLE) encoding (Cavigelli et al, 2019). A similar approach of storing only nonzero values was utilized by Lin & Lai (2018).…”
Section: Related Workmentioning
confidence: 99%
“…A similar approach of storing only nonzero values was utilized by Lin & Lai (2018). (Chmiel et al, 2019)…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation