2022
DOI: 10.1109/tpami.2021.3089687
|View full text |Cite
|
Sign up to set email alerts
|

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 45 publications
0
1
0
Order By: Relevance
“…Many companies have developed dedicated Neural Processing Units (NPUs) for mobile devices, which can process many trained Deep Neural Networks (DNNs) based applications in real-time. Even though training a DNN may suffer from a long delay, testing the network can be done in real-time [3,4]. Various methods have been proposed in the literature that focused on leveraging neural network based coding models for image and video compression.…”
Section: Introductionmentioning
confidence: 99%
“…Many companies have developed dedicated Neural Processing Units (NPUs) for mobile devices, which can process many trained Deep Neural Networks (DNNs) based applications in real-time. Even though training a DNN may suffer from a long delay, testing the network can be done in real-time [3,4]. Various methods have been proposed in the literature that focused on leveraging neural network based coding models for image and video compression.…”
Section: Introductionmentioning
confidence: 99%
“…RigL [90], ITOP [91], SET [104], DSR [89], and MEST [86], is provided in Tab. Three main sparsity schemes introduced in the area of network pruning consists of unstructured [105][106][107], structured [3,45,[108][109][110][111][112][113][114][115][116][117][118][119], and fine-grained structured pruning [120][121][122][123][124][125][126][127][128][129].…”
Section: Discussionmentioning
confidence: 99%
“…As a result, processing one bit of all active inputs requires (128 × 1 1.2GHz = 106.6ns). Instead, FORMS employs four 4-bit ADCs (within the same area of an 8-bit ADC but 1.8× times higher frequency) to compute 128 dot-products, which results in a cycle time of 128 4 × 1 2.1GHz = 15 ns. As a result, FORMS improves the cycle time that assists to increase the throughput.…”
Section: Overall Architecturementioning
confidence: 99%
See 1 more Smart Citation