2018 21st Euromicro Conference on Digital System Design (DSD) 2018
DOI: 10.1109/dsd.2018.00070
|View full text |Cite
|
Sign up to set email alerts
|

CoNNA – Compressed CNN Hardware Accelerator

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(39 citation statements)
references
References 20 publications
0
38
0
Order By: Relevance
“…Another important aspect to consider is the number of parameters of many state-of-the-art CNNs, which can be in the order of millions (see Table VII as an example). CNNs can also require billions of computations to classify a single input instance [81,82]. Moreover, CNNs produce several intermediate feature maps, which must be stored in memory.…”
Section: F Discussion About Computational Complexitymentioning
confidence: 99%
See 1 more Smart Citation
“…Another important aspect to consider is the number of parameters of many state-of-the-art CNNs, which can be in the order of millions (see Table VII as an example). CNNs can also require billions of computations to classify a single input instance [81,82]. Moreover, CNNs produce several intermediate feature maps, which must be stored in memory.…”
Section: F Discussion About Computational Complexitymentioning
confidence: 99%
“…However, even if compression is applied, the original CNN must be first fully trained, which still demands memory to store intermediate feature maps and computational power to perform all operations. Moreover, pruning and weight quantization may affect the overall accuracy of the CNN [81].…”
Section: F Discussion About Computational Complexitymentioning
confidence: 99%
“…There is also a CNN accelerator [11] with 8-bit precision in Table 4 while our main target for weight compression is 5-bit quantized weights. Though we have presented our technique for 5-bit weights, our arithmetic coding-based encoding and decoding technique can also be used along with 8-bit precision-based CNN accelerators.…”
Section: B Latency Overheadmentioning
confidence: 99%
“…• We introduce a lossless arithmetic coding-based 5-bit quantized weight compression technique; • We propose a hardware-based decoder for in-situ decompression of the compressed weights in the NPU or CNN accelerator, and also implement our hardwarebased decoder in field-programmable gate array (FPGA) as a proof-of-concept; • Our proposed technique for 5-bit quantized weights reduces the weight size by 9.6× (by up to 112.2× in the case of pruned weights) as compared to the case of using the uncompressed 32-bit floating-point (FP32) weights; • Our proposed technique for 5-bit quantized weights also reduces memory energy consumption by 89.2% (by up to 99.1% for pruned weights) as compared to the case of using the uncompressed FP32 weight; • When combining our compression technique and hardware decoder (16 decoding units) with various state-ofthe-art CNN accelerators [9] [10] [11], our technique incurs a small latency overhead by 0.16%-5.48% (0.16%-0.91% for pruned weights) as compared to the case without our proposed technique and hardware decoder. • When combining our proposed technique with various state-of-the-art CNN accelerators [9] [10], our proposed technique with 4 decoding unit (DU) decoder hardware reduces system-level energy consumption by 1.1%-9.3% as compared to the case without using our proposed technique.…”
Section: Introductionmentioning
confidence: 99%
“…Similar to Argus, SparseNN [17] and Cambricon-x [18] take advantage of skipping zeros in CNN weights. Beside mentioned, there are many other highquality architectures in terms of performance, like Eyeriss v2 [12], ENVISION [18], Thinker [19], UNPU [20], Snowflake [22], Caffeine [23], CoNNa [24], and architectures in [25]- [27].…”
Section: Introductionmentioning
confidence: 99%