Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy-and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i) how to find the best partition of a deep structure; ii) how to deploy the component at an edge device that only has limited computation power; and iii) how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1) A normalization based in-layer data compression strategy by jointly considering compression rate and model accuracy; 2) A latency-aware deep decoupling strategy to minimize the overall execution latency; and 3) An edge-cloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss.
The emerging concern about data privacy and security has motivated the proposal of federated learning. Federated learning allows computing nodes to only synchronize the locally- trained models instead of their original data in distributed training. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized typologies and large nodes-to-server bandwidths. However, in real-world federated learning scenarios, the network capacities between nodes are highly uniformly distributed and smaller than that in data centers. As a result, how to efficiently utilize network capacities between computing nodes is crucial for conventional federated learning. In this paper, we propose Bandwidth Aware Combo (BACombo), a model segment level decentralized federated learning, to tackle this problem. In BACombo, we propose a segmented gossip aggregation mechanism that makes full use of node-to-node bandwidth for speeding up the communication time. Besides, a bandwidth-aware worker selection model further reduces the transmission delay by greedily choosing the bandwidth-sufficient worker. The convergence guarantees are provided for BACombo. The experimental results on various datasets demonstrate that the training time is reduced by up to 18 times that of baselines without accuracy degrade.
Federated learning aims to collaboratively train a machine learning model with possibly geo‐distributed workers, which is inherently communication constrained. To achieve communication efficiency, the conventional federated learning algorithms allow the worker to decrease the communication frequency by training the model locally for multiple times. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralised topologies and large nodes‐to‐server bandwidths, and convergence property relies on the stochastic gradient descent training in local, which usually causes the large end‐to‐end training latency in real‐world federated learning scenarios. Thus, in this study, the authors propose the adaptive partial gradient aggregation method, a gradient partial level decentralised federated learning, to tackle this problem. In FedPGA, they propose a partial gradient exchange mechanism that makes full use of node‐to‐node bandwidth for speeding up the communication time. Besides, an adaptive model updating method further reduces the convergence rate by adaptive increasing the step size of the stable direction of gradient descent. The experimental results on various datasets demonstrate that the training time is reduced up to 14× compared to baselines without accuracy degrade.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.