Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

Matsubara, Yoshitomo; Levorato, Marco; Restuccia, Francesco

doi:10.48550/arxiv.2103.04505

Cited by 12 publications

(19 citation statements)

References 88 publications

(228 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Split Computing (SC) [48] is a framework that divides the DNN model into head and tail models, which are executed in the edge device and server, respectively. SC is attractive when compressed models for edge devices cannot achieve the same level of accuracy as their full counterpart models.…”

Section: Split Computationmentioning

confidence: 99%

Update Compression for Deep Neural Networks on the Edge

Chen¹,

Bakhshi²,

Batista³

et al. 2022

Preprint

View full text Add to dashboard Cite

An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the server side based on a copy of the DNN model, with only the necessary data transmitted to the edge to update the deployed model. However, due to bandwidth constraints, we want to minimise the transmission required to achieve the update. We develop a simple approach based on matrix factorisation to compress the model update-this differs from compressing the model itself. The key idea is to preserve existing knowledge in the current model and optimise only small additional parameters for the update which can be used to reconstitute the model on the edge. We compared our method to similar techniques used in federated learning; our method usually requires less than half of the update size of existing methods to achieve the same accuracy.

show abstract

Section: Split Computationmentioning

confidence: 99%

Update Compression for Deep Neural Networks on the Edge

Chen¹,

Bakhshi²,

Batista³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We use image data with relatively high resolution, including ImageNet [52], COCO [53], and PASCAL VOC datasets [54]. As pointed out in [55], split computing is mainly beneficial for supervised tasks involving high-resolution images e.g., 224 × 224 pixels or larger. For smaller data, either local processing or full offloading would be more suitable.…”

Section: Choice Of Datasetsmentioning

confidence: 99%

SC2: Supervised Compression for Split Computing

Matsubara¹,

Yang²,

Levorato³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Split computing distributes the execution of a neural network (e.g., for a classification task) between a low-powered mobile device and a more powerful edge server. A simple alternative to splitting the network is to carry out the supervised task purely on the edge server while compressing and transmitting the full data, and most approaches have barely outperformed this baseline. This paper proposes a new approach for discretizing and entropy-coding intermediate feature activations to efficiently transmit them from the mobile device to the edge server. We show that a efficient splittable network architecture results from a three-way tradeoff between (a) minimizing the computation on the mobile device, (b) minimizing the size of the data to be transmitted, and (c) maximizing the model's prediction performance. We propose an architecture based on this tradeoff and train the splittable network and entropy model in a knowledge distillation framework. In an extensive set of experiments involving three vision tasks, three datasets, nine baselines, and more than 180 trained models, we show that our approach improves supervised rate-distortion tradeoffs while maintaining a considerably smaller encoder size. We also release sc2bench, an installable Python package, to encourage and facilitate future studies on supervised compression for split computing (SC2).

show abstract

“…Other surveys. There have been certain previous surveys touching on the topic of early-exiting, either only briefly discussing it from the standpoint of dynamic inference networks [21] or combining it with offloading [54]. To the best of our knowledge, this is the first study that primarily focuses on early-exit networks and their design trade-offs across tasks, modalities and target hardware.…”

Section: Adaptive Inference Landscapementioning

confidence: 99%

Adaptive Inference through Early-Exit Networks

Laskaridis

Kouris

Lane

2021

Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning

View full text Add to dashboard Cite

DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a confident prediction, adaptive inference is gaining attention as a prominent approach for pushing the limits of efficient deployment. Particularly, early-exit networks comprise an emerging direction for tailoring the computation depth of each input sample at runtime, offering complementary performance gains to other efficiency optimisations. In this paper, we decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We also position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field. CCS CONCEPTS• Computing methodologies → Neural networks; • Humancentered computing → Ubiquitous and mobile computing systems and tools.

show abstract

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges

Cited by 12 publications

References 88 publications

Update Compression for Deep Neural Networks on the Edge

Update Compression for Deep Neural Networks on the Edge

SC2: Supervised Compression for Split Computing

Adaptive Inference through Early-Exit Networks

Contact Info

Product

Resources

About