Entropy-Based Gradient Compression for Distributed Deep Learning

Kuang, Di; Chen, Mengqiang; Xiao, Dongjie; Wu, Weigang

doi:10.1109/hpcc/smartcity/dss.2019.00046

Cited by 8 publications

(9 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, network communication is the major problem of DDL. Several methods [14][15][16][29][30][31][32] can be used to reduce the amount of network traffic, but this comes at a cost in terms of accuracy.…”

Section: Discussionmentioning

confidence: 99%

Distributed Deep Learning: From Single-Node to Multi-Node Architecture

Lerat

Mahmoudi

2022

Electronics

View full text Add to dashboard Cite

During the last years, deep learning (DL) models have been used in several applications with large datasets and complex models. These applications require methods to train models faster, such as distributed deep learning (DDL). This paper proposes an empirical approach aiming to measure the speedup of DDL achieved by using different parallelism strategies on the nodes. Local parallelism is considered quite important in the design of a time-performing multi-node architecture because DDL depends on the time required by all the nodes. The impact of computational resources (CPU and GPU) is also discussed since the GPU is known to speed up computations. Experimental results show that the local parallelism impacts the global speedup of the DDL depending on the neural model complexity and the size of the dataset. Moreover, our approach achieves a better speedup than Horovod.

show abstract

Section: Discussionmentioning

confidence: 99%

Distributed Deep Learning: From Single-Node to Multi-Node Architecture

Lerat

Mahmoudi

2022

Electronics

View full text Add to dashboard Cite

show abstract

“…Using the obtained entropy information and QuickSelect algorithm, the threshold is calculated and only those gradients with absolute value above the threshold are transmitted in that communication round. The results in [45] showed that up to 1000 times gradient compression is achievable while keeping the accuracy of the model nearly unchanged. Fast FL was proposed by Nori et al [46] which attempts to jointly consider the local weight updates and gradient compression tradeoff in FL.…”

Section: B Gradient Compressionmentioning

confidence: 99%

“…Abrahamyan et al [44] designed an autoencoder with a lightweight architecture which captures the common patterns in the gradients of the different distributed clients and achieved a 8095 times compression which is 8 times more than DGC. Entropy based gradient compression scheme was proposed by Kuang et al [45] which consisted of an entropy based threshold selection method and a learning rate correction algorithm. Entropy is a well known metric from information theory which here measures the uncertainty or disorder of the gradients.…”

Section: B Gradient Compressionmentioning

confidence: 99%

Federated learning and next generation wireless communications: A survey on bidirectional relationship

Shome¹,

Waqar²,

Khan³

2021

Preprint

View full text Add to dashboard Cite

In order to meet the extremely heterogeneous requirements of the next generation wireless communication networks, research community is increasingly dependent on using machine learning solutions for real-time decision-making and radio resource management. Traditional machine learning employs fully centralized architecture in which the entire training data is collected at one node e.g., cloud server, that significantly increases the communication overheads and also raises severe privacy concerns. Towards this end, a distributed machine learning paradigm termed as Federated learning (FL) has been proposed recently. In FL, each participating edge device trains its local model by using its own training data. Then, via the wireless channels the weights or parameters of the locally trained models are sent to the central PS, that aggregates them and updates the global model. On one hand, FL plays an important role for optimizing the resources of wireless communication networks, on the other hand, wireless communications is crucial for FL. Thus, a 'bidirectional' relationship exists between FL and wireless communications. Although FL is an emerging concept, many publications have already been published in the domain of FL and its applications for next generation wireless networks. Nevertheless, we noticed that none of the works have highlighted the bidirectional relationship between FL and wireless communications. Therefore, the purpose of this survey paper is to bridge this gap in literature by providing a timely and comprehensive discussion on the interdependency between FL and wireless communications.

show abstract

“…Abrahamyan et al 44 designed an autoencoder with a lightweight architecture which captures the common patterns in the gradients of the different distributed clients and achieved a 8095 times compression which is 8 times more than DGC. Entropy based gradient compression scheme was proposed by Kuang et al 45 which consisted of an entropy based threshold selection method and a learning rate correction algorithm. Entropy is a well-known metric from information theory which here measures the uncertainty or disorder of the gradients.…”

Section: Gradient Compressionmentioning

confidence: 99%

Federated learning and next generation wireless communications: A survey on bidirectional relationship

Shome

Waqar

Khan

2022

Trans Emerging Tel Tech

View full text Add to dashboard Cite

In order to meet the extremely heterogeneous requirements of the next generation wireless communication networks, research community is increasingly dependent on using machine‐learning solutions for real‐time decision‐making and radio resource management. Traditional machine learning employs fully centralized architecture in which the entire training data is collected at one node for example, cloud server, that significantly increases the communication overheads and also raises severe privacy concerns. Toward this end, a distributed machine‐learning paradigm termed as federated learning (FL) has been proposed recently. In FL, each participating edge device trains its local model by using its own training data. Then, via the wireless channels the weights or parameters of the locally trained models are sent to the central parameter server (PS), that aggregates them and updates the global model. On one hand, FL plays an important role for optimizing the resources of wireless communication networks, on the other hand, wireless communications is crucial for FL. Thus, a “bidirectional” relationship exists between FL and wireless communications. Although FL is an emerging concept, many publications have already been published in the domain of FL and its applications for next generation wireless networks. Nevertheless, we noticed that none of the works have highlighted the bidirectional relationship between FL and wireless communications. Therefore, the purpose of this survey article is to bridge this gap in literature by providing a timely and comprehensive discussion on the interdependency between FL and wireless communications.

show abstract

Entropy-Based Gradient Compression for Distributed Deep Learning

Cited by 8 publications

References 14 publications

Distributed Deep Learning: From Single-Node to Multi-Node Architecture

Distributed Deep Learning: From Single-Node to Multi-Node Architecture

Federated learning and next generation wireless communications: A survey on bidirectional relationship

Federated learning and next generation wireless communications: A survey on bidirectional relationship

Contact Info

Product

Resources

About