The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2019
DOI: 10.1145/3358205
|View full text |Cite
|
Sign up to set email alerts
|

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Abstract: Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
34
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 38 publications
(41 citation statements)
references
References 21 publications
0
34
0
Order By: Relevance
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
See 4 more Smart Citations
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
“…These patterns of activations reveal how knowledge learned by the teacher network is distributed at the final convolution layer. Therefore, we first use these patterns to create a filter activation network [1] which represents how knowledge is distributed across multiple filters (see Fig. 3(b)).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
See 3 more Smart Citations