2020
DOI: 10.48550/arxiv.2001.00705
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

Abstract: While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, softer decisions. We therefore propose a Dynamic Fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 19 publications
(4 reference statements)
0
4
0
Order By: Relevance
“…A systematic approach to find the correct precision for each layer has been shown in (Wang et al, 2019;Dong et al, 2019;Cai et al, 2020). Dynamic multi-granularity for tensors is also considered as a way of computation saving (Shen et al, 2020). Several quantization schemes have been proposed for training (Wu et al, 2018b;Banner et al, 2018;Das et al, 2018;De Sa et al, 2018;Park et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A systematic approach to find the correct precision for each layer has been shown in (Wang et al, 2019;Dong et al, 2019;Cai et al, 2020). Dynamic multi-granularity for tensors is also considered as a way of computation saving (Shen et al, 2020). Several quantization schemes have been proposed for training (Wu et al, 2018b;Banner et al, 2018;Das et al, 2018;De Sa et al, 2018;Park et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, mixed-precision DNN accelerators that support versatility in data types are crucial and sometimes mandatory to exploit the benefit of different software optimizations (e.g., low-bit quantization). Moreover, supporting versatility in data types can be leveraged to trade off accuracy for efficiency based on the available resources (Shen et al, 2020). Typically, mixedprecision accelerators are designed based on low precision arithmetic units, and higher precision operation can be supported by fusing the low precision arithmetic units temporally or spatially.…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by [37] and following [30], we calculate the computational cost of DNNs using the effective number of MACs, i.e., (# of M ACs) * Bit a /32 * Bit b /32 for a dot product between a and b, where Bit a and Bit b denote the precision of a and b, respectively. As such, this metric is proportional to the total number of bit operations.…”
Section: Design Of Pfqmentioning
confidence: 99%
“…Dynamic/efficient DNN training. More recently dynamic inference [23,9,24,25,26,27,28,29] was developed to reduce the average inference cost, which was then extended to the most fine-grained bit level [30,31]. While energy-efficient training is more complicated than and different from inference, many insights of the latter can be lent to the former.…”
Section: Introduction 2 Prior Workmentioning
confidence: 99%