The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2019
DOI: 10.48550/arxiv.1907.11804
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
See 3 more Smart Citations
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
“…These patterns of activations reveal how knowledge learned by the teacher network is distributed at the final convolution layer. Therefore, we first use these patterns to create a filter activation network [1] which represents how knowledge is distributed across multiple filters (see Fig. 3(b)).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
See 2 more Smart Citations
“…Different methods have been developed [207], [208] to partition a pre-trained DNN over several mobile devices in order to accelerate DNN inference on devices. Bhardwaj et al [209] further considered memory and communication costs in this distributed inference architecture, for which model compression and network science-based knowledge partitioning algorithm are proposed to address these issues. For robotics system where the model is partitioned between the edge server and the robot, the robot should take both local computation accuracy and offloading latency into account, and this offloading problem was formulated in [210] as a sequential decision making problem that is solved by a deep reinforcement learning algorithm.…”
Section: Computation Offloading Based Edge Inference Systemsmentioning
confidence: 99%