2021 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2021
DOI: 10.23919/date51398.2021.9474031
|View full text |Cite
|
Sign up to set email alerts
|

Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

Abstract: As neural networks gain widespread adoption in embedded devices, there is a growing need for model compression techniques to facilitate seamless deployment in resourceconstrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most quantization approaches take a fully trained model, then apply different heuristics to determine the optimal bit-precision for different layers of the network, and finally retrain the network to regain any drop in accuracy. Based o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…In our implementation we use k-WTA to achieve activation sparsity. Another approach is to remove entire channels in convolutional layers during training through a structured pruning process [19,44,73]. In [19] they notice that activations naturally become sparse during training and use a measure of sparsity to gradually prune channels.…”
Section: Accelerating Sparse Network On Other Platformsmentioning
confidence: 99%
See 2 more Smart Citations
“…In our implementation we use k-WTA to achieve activation sparsity. Another approach is to remove entire channels in convolutional layers during training through a structured pruning process [19,44,73]. In [19] they notice that activations naturally become sparse during training and use a measure of sparsity to gradually prune channels.…”
Section: Accelerating Sparse Network On Other Platformsmentioning
confidence: 99%
“…In [19] they notice that activations naturally become sparse during training and use a measure of sparsity to gradually prune channels. In [73] the method is extended to incorporate mixed precision quantization based on activation sparsity and then evaluated on a hardware simulation platform. At a high level our approach is complementary to theirs and can be combined to achieve even greater speedups.…”
Section: Accelerating Sparse Network On Other Platformsmentioning
confidence: 99%
See 1 more Smart Citation