NestDNN

Fang, Biyi; Zeng, Xiao; Zhang, Mi

doi:10.1145/3241539.3241559

Cited by 192 publications

(25 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, to adapt to the dynamically changing available resources of IoT devices, Han et al [27] proposed deploying multiple model variants on IoT devices. Fang et al [28] proposed making multiple model variants share parameters to save the limited storage resources of the IoT device. Additionally, to reduce the number of model parameters when the cloud server assists in training the IoT neural network model, many researchers proposed using model compression [29] or knowledge distillation [30] to reduce the amount of model parameter transmission.…”

Section: Cloud-assisted Approachmentioning

confidence: 99%

See 1 more Smart Citation

TFormer: A Transmission-Friendly ViT Model for IoT Devices

Ding

Juefei-Xu³

et al. 2023

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Deploying high-performance vision transformer (ViT) models on ubiquitous Internet of Things (IoT) devices to provide high-quality vision services will revolutionize the way we live, work, and interact with the world. Due to the contradiction between the limited resources of IoT devices and resource-intensive ViT models, the use of cloud servers to assist ViT model training has become mainstream. However, due to the larger number of parameters and floating-point operations (FLOPs) of the existing ViT models, the model parameters transmitted by cloud servers are large and difficult to run on resource-constrained IoT devices. To this end, this paper proposes a transmission-friendly ViT model, TFormer, for deployment on resource-constrained IoT devices with the assistance of a cloud server. The high performance and small number of model parameters and FLOPs of TFormer are attributed to the proposed hybrid layer and the proposed partially connected feed-forward network (PCS-FFN). The hybrid layer consists of nonlearnable modules and a pointwise convolution, which can obtain multitype and multiscale features with only a few parameters and FLOPs to improve the TFormer performance. The PCS-FFN adopts group convolution to reduce the number of parameters. The key idea of this paper is to propose TFormer with few model parameters and FLOPs to facilitate applications running on resource-constrained IoT devices to benefit from the high performance of the ViT models. Experimental results on the ImageNet-1K, MS COCO, and ADE20K datasets for image classification, object detection, and semantic segmentation tasks demonstrate that the proposed model outperforms other state-of-the-art models. Specifically, TFormer-S achieves 5% higher accuracy on ImageNet-1K than ResNet18 with 1.4× fewer parameters and FLOPs.

show abstract

Section: Cloud-assisted Approachmentioning

confidence: 99%

“…Model compression techniques can further reduce the number of model parameters based on our method. The study of adapting IoT devices [27], [28] can also be built on our method.…”

Section: Cloud-assisted Approachmentioning

confidence: 99%

TFormer: A Transmission-Friendly ViT Model for IoT Devices

Ding

Juefei-Xu³

et al. 2023

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…To mitigate/avoid these problems, [6], [7] suggested ondevice resource management. DeepMon [6] aims to guarantee continuous vision apps by optimizing the convolutional neural networks (CNN) on mobile GPUs.…”

Section: A Resource Management For Multiple Vision Appsmentioning

confidence: 99%

“…It accelerated the convolution by reusing the intermediate results via caching. NestDNN [7] proposed a filter pruning for resource management. However, both DeepMon and NestDNN support only non-real-time tasks, probably because the most commonly used real-time tasks (e.g., object detection or tracking) require significant amounts of computation that the edge system cannot complete in a timely manner (e.g., DeepMon shows only 1∼2 FPS).…”

Section: A Resource Management For Multiple Vision Appsmentioning

confidence: 99%

DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

Cho¹,

Shin²

2023

Preprint

View full text Add to dashboard Cite

As deep neural networks (DNNs) prove their importance and feasibility, more and more DNN-based apps, such as detection and classification of objects, have been developed and deployed on autonomous vehicles (AVs). To meet their growing expectations and requirements, AVs should "optimize" use of their limited onboard computing resources for multiple concurrent in-vehicle apps while satisfying their timing requirements (especially for safety). That is, real-time AV apps should share the limited on-board resources with other concurrent apps without missing their deadlines dictated by the frame rate of a camera that generates and provides input images to the apps. However, most, if not all, of existing DNN solutions focus on enhancing the concurrency of their specific hardware without dynamically optimizing/modifying the DNN apps' resource requirements, subject to the number of running apps, owing to their high computational cost. To mitigate this limitation, we propose DynaMIX (Dynamic MIXed-precision model construction), which optimizes the resource requirement of concurrent apps and aims to maximize execution accuracy. To realize a real-time resource optimization, we formulate an optimization problem using app performance profiles to consider both the accuracy and worstcase latency of each app. We also propose dynamic model reconfiguration by lazy loading only the selected layers at runtime to reduce the overhead of loading the entire model. DynaMIX is evaluated in terms of constraint satisfaction and inference accuracy for a multi-tasking system and compared against stateof-the-art solutions, demonstrating its effectiveness and feasibility under various environmental/operating conditions.

show abstract

“…Targeting on-device deep learning, some researchers define multi-tenant as processing multiple computer vision applications for multiple concurrent tasks [77,78]. However, they focus on the multi-tenant on-device inference rather than training.…”

Section: Multi-tenancy Of Federated Learningmentioning

confidence: 99%

Toward a generic federated learning platform optimized for computer vision applications

Zhuang¹

View full text Add to dashboard Cite

Chapter 3 is published as Weiming Zhuang, Xin Gan, Yonggang Wen, and Shuai Zhang, "EasyFL: A Low-code Federated Learning Platform for Dummies", IEEE Internet of Things Journal (IOT-J), 2022. The contributions of the co-authors are as follows:• I proposed the idea, designed the system architecture, wrote the manuscript, and revised the paper.• Xin Gan and I co-implemented the system and conducted the experiments.• Prof. Yonggang Wen provided insightful comments on the idea, the system, and the manuscript.

show abstract

NestDNN

Cited by 192 publications

References 27 publications

TFormer: A Transmission-Friendly ViT Model for IoT Devices

TFormer: A Transmission-Friendly ViT Model for IoT Devices

DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

Toward a generic federated learning platform optimized for computer vision applications

Contact Info

Product

Resources

About