2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018
DOI: 10.1109/cvprw.2018.00221
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Deep Learning Inference Based on Model Compression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 14 publications
1
5
0
Order By: Relevance
“…Our high fps is a consequence of our linear runtime complexity and we validate our theoretical claims in Section V. We further hypothesize that prior deep learning-based methods [37], [14] are less optimal in terms of runtime due to the intensive computation requirements by deep neural networks [51], [45]. For example, ResNet [18] needs more than 25 MB for storing the computed model in memory, and more than 4 billion float point operations (FLOPs) to process a single image of size 224×224 [51].…”
Section: Discussionsupporting
confidence: 72%
“…Our high fps is a consequence of our linear runtime complexity and we validate our theoretical claims in Section V. We further hypothesize that prior deep learning-based methods [37], [14] are less optimal in terms of runtime due to the intensive computation requirements by deep neural networks [51], [45]. For example, ResNet [18] needs more than 25 MB for storing the computed model in memory, and more than 4 billion float point operations (FLOPs) to process a single image of size 224×224 [51].…”
Section: Discussionsupporting
confidence: 72%
“…In contrast, structured pruning is more friendly and efficient on various off-the-shelf deployment platforms, simultaneously speeding up network inference and reducing the memory overhead of CNNs. It can be further categorized into greedy-based pruning [25], [27], [30], [44], [52], [58], [60], [81], search-based pruning [17], [26], [54], [57], dynamic pruning [7], [12], [47], [63], [73], [76], and sparsity regularization-based pruning [31], [42], [45], [46], [51], [53], [56], [75], [78], [80], [84].…”
Section: Related Work a Network Pruningmentioning
confidence: 99%
“…The pruning results of ResNeXt-29 are shown in Table 5. By adding edge sparsity regularization, edge-level pruning [84] achieves an increase in error of 0.16% with 55.4% and 28.4% pruning rates in terms of the FLOPs and parameters, respectively. Compared to edge-level pruning, our OED removes 5 out of 9 residual blocks, achieving a higher parameter pruning rate of 58.5% (vs. 28.4%), with a slightly lower classification error of 4.08% (vs. 4.11%).…”
Section: B: Resnext-29mentioning
confidence: 99%
See 1 more Smart Citation
“…As edge inferencing [43] is as important as model training, research has also fond to reduce the latency of inference time locally rather than connecting the edge devices towards the cloud server for inferencing. To enable model to run efficiently on the edge device such as an embedded device, model compression [44] is used to reduce model size and complexity. Vanhouckeet.…”
Section: Implementation On Hardware Accelerationmentioning
confidence: 99%