Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Mohammed, Thaha; Joe‐Wong, Carlee; Babbar, Rohit; Francesco, Mario Di

doi:10.1109/infocom41043.2020.9155237

Cited by 134 publications

(70 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This problem is similar to the problem that we have in the exchange communication in Appendix A-B with some exceptions. The Lagrangian is By following the same path with the exchange communication solution in Appendix A-B, we can find the optimal value of (p FRD k ) * = (E FRDRF k ) * /(t FRD k ) * as in (39) where by using (88) and complementary slackness conditions, the optimal value of (t FRD k ) * can be found as in (40).…”

Section: Solution Of Problem (75)mentioning

confidence: 99%

An Energy-Efficient Fine-Grained Deep Neural Network Partitioning Scheme for Wireless Collaborative Fog Computing

et al. 2021

View full text Add to dashboard Cite

Fog computing is a potential solution for heterogeneous resource-constrained mobile devices to collaboratively operate deep learning-driven applications at the edge of the networks, instead of offloading the computations of these applications to the powerful cloud servers thanks to the latency reduction, decentralized structure, and privacy concerns. Compared to the mobile cloud computing concept where computation-intensive deep learning operations are offloaded to the powerful cloud servers, making use of the computing capabilities of resource-constrained devices can improve the delay performance and lessen the need for powerful servers to execute such applications by considering a collaborative fog computing scenario with deep neural network (DNN) partitioning. In this paper, we propose an energy-efficient finegrained DNN partitioning scheme for wireless collaborative fog computing systems. The proposed scheme includes both layer-based partitioning where the DNN model is divided into layer by layer and horizontal partitioning where the input data of each layer operation is partitioned among multiple devices to encourage parallel computing. A convex optimization problem is formulated to minimize the energy consumption of the collaborative part of the system by optimizing the communication and computation parameters as well as the workload of each participating device and solved by using the primal-dual decomposition and Lagrange duality theory. As can be observed in the simulation results, the proposed optimized scheme makes a notable difference in the energy consumption compared to the non-optimized scenario where the workload distribution is equal for all participating devices but the communication and computation parameters are still optimized, so it is a quite challenging bound to be compared.INDEX TERMS Convex optimization, deep convolutional neural network, energy efficiency, fog computing, DNN partitioning, wireless collaborative computing.

show abstract

Section: Solution Of Problem (75)mentioning

confidence: 99%

An Energy-Efficient Fine-Grained Deep Neural Network Partitioning Scheme for Wireless Collaborative Fog Computing

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Notwithstanding, along the continuum, Fog plays a role in addition to Edge and Cloud. This led to DINA [28], a fine-grained solution based on matching theory for dynamic DDNN partitioning in fog networks. Regardless, early exits, as BranchyNet [26] proposes, need to be considered since the inference early stops at middle layers reduce not only response time, but also network traffic [29] and computing capacity [30].…”

Section: Related Workmentioning

confidence: 99%

Managing and Deploying Distributed and Deep Neural Models Through Kafka-ML in the Cloud-to-Things Continuum

Carnero

Martin²,

Torres

et al. 2021

IEEE Access

View full text Add to dashboard Cite

The Internet of Things (IoT) is constantly growing, generating an uninterrupted data stream pipeline to monitor physical world information. Hence, Artificial Intelligence (AI) continuously evolves, improving life quality and business and academic activities. Kafka-ML is an open-source framework that focuses on managing Machine Learning (ML) and AI pipelines through data streams in production scenarios. Consequently, it facilitates Deep Neural Network (DNN) deployments in real-world applications. However, this framework does not consider the distribution of DNN models on the Cloud-to-Things Continuum. Distributed DNN significantly reduces latency, allocating the computational and network load between different infrastructures. In this work, we have extended our Kafka-ML framework to support the management and deployment of Distributed DNN throughout the Cloud-to-Things Continuum. Moreover, we have considered the possibility of including early exits in the Cloud-to-Things layers to provide immediate responses upon predictions. We have evaluated these new features by adapting and deploying the DDN model AlexNet in three different Cloud-to-Things scenarios. Experiments demonstrate that Kafka-ML can significantly improve response time and throughput by distributing DDN models throughout the Cloud-to-Things Continuum, compared to a Cloud-only deployment.

show abstract

“…Recently, much effort focused on investigating DNN inference accelerations through task offloading in MEC environments. Mohammed et al [15] devised a novel DNN partitioning scheme in an MEC network, and applied the matching theory to distribute the DNN parts to edge servers, with the aim to minimize the total computation time. Xu et al [21] investigated the DNN inference offloading in an MEC network, assuming that each requested DNN has been partitioned.…”

Section: Related Workmentioning

confidence: 99%

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism

Liang

et al. 2021

2021 IEEE 46th Conference on Local Computer Networks (LCN)

View full text Add to dashboard Cite

Mobile Edge Computing (MEC) has emerged as a promising paradigm catering to overwhelming explosions of mobile applications, by offloading the compute-intensive tasks to an MEC network for processing. The surging of deep learning brings new vigor and vitality to shape the prospect of intelligent Internet of Things (IoT), and edge intelligence arises to provision real-time deep neural network (DNN) inference services for users. To accelerate the processing of the DNN inference of a request in an MEC network, the DNN inference model usually can be partitioned into two connected parts: one part is processed on the local IoT device of the request; and another part is processed on a cloudlet (server) in the MEC network. Also, the DNN inference can be further accelerated by allocating multiple threads of the cloudlet in which the request is assigned.In this paper, we study a novel delay-aware DNN inference throughput maximization problem with the aim to maximize the number of delay-aware DNN service requests admitted, by accelerating each DNN inference through jointly exploring DNN model partitioning and multi-thread parallelism of DNN inference. To this end, we first show that the problem is NP-hard. We then devise a constant approximation algorithm for it. We finally evaluate the performance of the proposed algorithm through experimental simulations. Experimental results demonstrate that the proposed algorithm is promising.

show abstract

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

Cited by 134 publications

References 34 publications

An Energy-Efficient Fine-Grained Deep Neural Network Partitioning Scheme for Wireless Collaborative Fog Computing

An Energy-Efficient Fine-Grained Deep Neural Network Partitioning Scheme for Wireless Collaborative Fog Computing

Managing and Deploying Distributed and Deep Neural Models Through Kafka-ML in the Cloud-to-Things Continuum

Delay-Aware DNN Inference Throughput Maximization in Edge Computing via Jointly Exploring Partitioning and Parallelism

Contact Info

Product

Resources

About