Cooperative Learning VIA Federated Distillation OVER Fading Channels

Ahn, Jaehwan; Simeone, Osvaldo; Kang, Joonhyuk

doi:10.1109/icassp40776.2020.9053448

Cited by 32 publications

(27 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While FD performs well when the mobile devicegenerated data is identically and independently distributed, FD exhibits lower performance than the FL benchmark with model parameter exchange in non-IID data distributions. This was experimentally verified in [6], [7], [8], [9] and the experiments presented in Section 4. To fill this gap, we design an FL framework with model output exchange achieving similar or higher performance than previously proposed approaches even when subjected to non-IID data distributions.…”

Section: Related Work and Paper Organizationsupporting

confidence: 58%

“…Federated learning with model output exchange over mobile device-generated dataset. FD is proposed in [6], [7], [8], [9] as an FL framework with model output exchange that trains ML models considering mobile devicegenerated dataset. Unlike CD and PATE that trains distributed ML models using a shared dataset, each mobile device trains each ML model using a local dataset, enabling ML model training with mobile device-generated data.…”

Section: Related Work and Paper Organizationmentioning

confidence: 99%

“…Leveraging unlabeled data towards performance similar to that of benchmark federated learning under non-IID. Typical FL with model output exchange termed federated distillation (FD) [6], [7], [8], [9] can achieve scalability of the model size; however, these methods provide poor models in general. In FD, each mobile device re-trains local model based on both the local labeled data and the global logits.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training With Non-IID Private Data

Itahara

Nishio

Koda

et al. 2023

IEEE Trans. on Mobile Comput.

126

View full text Add to dashboard Cite

This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices' dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

show abstract

Section: Related Work and Paper Organizationsupporting

confidence: 58%

Section: Related Work and Paper Organizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training With Non-IID Private Data

Itahara

Nishio

Koda

et al. 2023

IEEE Trans. on Mobile Comput.

126

View full text Add to dashboard Cite

show abstract

“…4(b) (where 0 < λ < 1) into Eq. (7). The dependence of the training loss is insignificant at any K, even when the transmission rate was reduced from 6 Mbps to 3 Mbps.…”

Section: B Effects Of Network Density On C-sgdmentioning

confidence: 90%

“…Some researchers have investigated methods for quantization and sparsification of data that must be communicated to reduce the communication load. In particular, compressed sensing [6], [29] and digital data coding [7], [8] have been explored for data reduction when communicating with a central server. These approaches have been extended to be applicable to wireless systems.…”

Section: Related Work a Centralized Settingmentioning

confidence: 99%