Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Bhardwaj, Kartikeya; Lin, Ching‐Yi; Sartor, Anderson L.; Mărculescu, Radu

doi:10.1145/3358205

Cited by 38 publications

(41 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”

Section: Communication-aware Model Compressionmentioning

confidence: 99%

“…These patterns of activations reveal how knowledge learned by the teacher network is distributed at the final convolution layer. Therefore, we first use these patterns to create a filter activation network [1] which represents how knowledge is distributed across multiple filters (see Fig. 3(b)).…”

Section: Network-of-neural Networkmentioning

confidence: 99%

“…3(b)). We then partition this network into disjoint subsets via community detection [10], a network partitioning technique (see [1] for details).…”

Section: Network-of-neural Networkmentioning

confidence: 99%

“…However, due to their enormous computational complexity, deploying such models on constrained devices has emerged as a critical bottleneck for large-scale adoption of intelligence at the IoT edge. It has been estimated that the number of connected IoT-devices will reach one trillion across various market segments by 2035 1 ; this provides us a unique opportunity for integrating widespread intelligence in edge devices. Such an exponential growth in IoT-devices necessitates new breakthroughs in Artificial Intelligence research that can more effectively deploy learning at the edge and, therefore, truly exploit the setup of trillions of IoT-devices.…”

Section: Introductionmentioning

confidence: 99%

“…Finally, since IoT naturally implies a network of connected devices, it automatically opens the door to a new class of problems -communication-aware model compression [1]. For instance, many smart home/cities applications can have numerous connected IoT-sensors with, say, 500KB total memory per node.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

EdgeAl: A Vision for Deep Learning in the IoT Era

2021

Self Cite

View full text Add to dashboard Cite

The significant computational requirements of deep learning present a major bottleneck for its large-scale adoption on hardwareconstrained IoT-devices. Here, we envision a new paradigm called EdgeAI to address major impediments associated with deploying deep networks at the edge. Specifically, we discuss the existing directions in computation-aware deep learning and describe two new challenges in the IoT era: (1) Data-independent deployment of learning, and (2) Communication-aware distributed inference. We further present new directions from our recent research to alleviate the latter two challenges. Overcoming these challenges is crucial for rapid adoption of learning on IoT-devices in order to truly enable EdgeAI.

show abstract

Section: Communication-aware Model Compressionmentioning

confidence: 99%

Section: Network-of-neural Networkmentioning

confidence: 99%

“…3(b)). We then partition this network into disjoint subsets via community detection [10], a network partitioning technique (see [1] for details).…”

Section: Network-of-neural Networkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

EdgeAl: A Vision for Deep Learning in the IoT Era

2021

Self Cite

View full text Add to dashboard Cite

show abstract

Memory optimization at Edge for Distributed Convolution Neural Network

Naveen

Kounte

2022

Trans Emerging Tel Tech

View full text Add to dashboard Cite

Internet of Things (IoT) edge intelligence has emerged by optimizing the deep learning (DL) models deployed on resource‐constraint devices for quick decision‐making. In addition, edge intelligence reduces network overload and latency by bringing intelligent analytics closer to the source. On the other hand, DL models need a lot of computing resources. As a result, they have high computational workloads and memory footprint, making it impractical to deploy and execute on IoT edge devices with limited capabilities. In addition, existing layer‐based partitioning methods generate many intermediate results, resulting in a huge memory footprint. In this article, we propose a framework to provide a comprehensive solution that enables the deployment of convolutional neural networks (CNNs) onto distributed IoT devices for faster inference and reduced memory footprint. This framework considers a pretrained YOLOv2 model, and a weight pruning technique is applied to the pre‐trained model to reduce the number of non‐contributing parameters. We use the fused layer partitioning method to vertically partition the fused layers of the CNN and then distribute the partition among the edge devices to process the input. In our experiment, we have considered multiple Raspberry Pi as edge devices. Raspberry Pi with a neural computing stick is a gateway device to combine the results from various edge devices and get the final output. Our proposed model achieved inference latency of 5 to ∼$$ \sim $$7 seconds for 3prefix×3$$ 3\times 3 $$ to 5prefix×5$$ 5\times 5 $$ fused layer partitioning for five devices with a 9% improvement in memory footprint.

show abstract