Abstract:Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefor… Show more
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
“…These patterns of activations reveal how knowledge learned by the teacher network is distributed at the final convolution layer. Therefore, we first use these patterns to create a filter activation network [1] which represents how knowledge is distributed across multiple filters (see Fig. 3(b)).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
“…3(b)). We then partition this network into disjoint subsets via community detection [10], a network partitioning technique (see [1] for details).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
“…However, due to their enormous computational complexity, deploying such models on constrained devices has emerged as a critical bottleneck for large-scale adoption of intelligence at the IoT edge. It has been estimated that the number of connected IoT-devices will reach one trillion across various market segments by 2035 1 ; this provides us a unique opportunity for integrating widespread intelligence in edge devices. Such an exponential growth in IoT-devices necessitates new breakthroughs in Artificial Intelligence research that can more effectively deploy learning at the edge and, therefore, truly exploit the setup of trillions of IoT-devices.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, since IoT naturally implies a network of connected devices, it automatically opens the door to a new class of problems -communication-aware model compression [1]. For instance, many smart home/cities applications can have numerous connected IoT-sensors with, say, 500KB total memory per node.…”
The significant computational requirements of deep learning present a major bottleneck for its large-scale adoption on hardwareconstrained IoT-devices. Here, we envision a new paradigm called EdgeAI to address major impediments associated with deploying deep networks at the edge. Specifically, we discuss the existing directions in computation-aware deep learning and describe two new challenges in the IoT era: (1) Data-independent deployment of learning, and (2) Communication-aware distributed inference. We further present new directions from our recent research to alleviate the latter two challenges. Overcoming these challenges is crucial for rapid adoption of learning on IoT-devices in order to truly enable EdgeAI.
“…With ever-more devices available on the edge, this network of devices must be exploited to improve model accuracy without increasing the communication latency. Towards this, we describe our recent work on Network-of-Neural Networks (NoNN) [1] for memoryand communication-aware model compression.…”
Section: Communication-aware Model Compressionmentioning
confidence: 99%
“…These patterns of activations reveal how knowledge learned by the teacher network is distributed at the final convolution layer. Therefore, we first use these patterns to create a filter activation network [1] which represents how knowledge is distributed across multiple filters (see Fig. 3(b)).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
“…3(b)). We then partition this network into disjoint subsets via community detection [10], a network partitioning technique (see [1] for details).…”
Section: Network-of-neural Networkmentioning
confidence: 99%
“…However, due to their enormous computational complexity, deploying such models on constrained devices has emerged as a critical bottleneck for large-scale adoption of intelligence at the IoT edge. It has been estimated that the number of connected IoT-devices will reach one trillion across various market segments by 2035 1 ; this provides us a unique opportunity for integrating widespread intelligence in edge devices. Such an exponential growth in IoT-devices necessitates new breakthroughs in Artificial Intelligence research that can more effectively deploy learning at the edge and, therefore, truly exploit the setup of trillions of IoT-devices.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, since IoT naturally implies a network of connected devices, it automatically opens the door to a new class of problems -communication-aware model compression [1]. For instance, many smart home/cities applications can have numerous connected IoT-sensors with, say, 500KB total memory per node.…”
The significant computational requirements of deep learning present a major bottleneck for its large-scale adoption on hardwareconstrained IoT-devices. Here, we envision a new paradigm called EdgeAI to address major impediments associated with deploying deep networks at the edge. Specifically, we discuss the existing directions in computation-aware deep learning and describe two new challenges in the IoT era: (1) Data-independent deployment of learning, and (2) Communication-aware distributed inference. We further present new directions from our recent research to alleviate the latter two challenges. Overcoming these challenges is crucial for rapid adoption of learning on IoT-devices in order to truly enable EdgeAI.
Internet of Things (IoT) edge intelligence has emerged by optimizing the deep learning (DL) models deployed on resource‐constraint devices for quick decision‐making. In addition, edge intelligence reduces network overload and latency by bringing intelligent analytics closer to the source. On the other hand, DL models need a lot of computing resources. As a result, they have high computational workloads and memory footprint, making it impractical to deploy and execute on IoT edge devices with limited capabilities. In addition, existing layer‐based partitioning methods generate many intermediate results, resulting in a huge memory footprint. In this article, we propose a framework to provide a comprehensive solution that enables the deployment of convolutional neural networks (CNNs) onto distributed IoT devices for faster inference and reduced memory footprint. This framework considers a pretrained YOLOv2 model, and a weight pruning technique is applied to the pre‐trained model to reduce the number of non‐contributing parameters. We use the fused layer partitioning method to vertically partition the fused layers of the CNN and then distribute the partition among the edge devices to process the input. In our experiment, we have considered multiple Raspberry Pi as edge devices. Raspberry Pi with a neural computing stick is a gateway device to combine the results from various edge devices and get the final output. Our proposed model achieved inference latency of 5 to ∼$$ \sim $$7 seconds for 3prefix×3$$ 3\times 3 $$ to 5prefix×5$$ 5\times 5 $$ fused layer partitioning for five devices with a 9% improvement in memory footprint.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.