2023
DOI: 10.1109/jstsp.2022.3223526
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Selection and Local Updating Optimization for Federated Knowledge Distillation With Heterogeneous Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…In the experiments, ten clients participate in the distillation process, and we evaluate the model’s performance under two non-IID distribution settings: a strong non-IID setting and a weak non-IID setting, where each client has one unique class and two classes, respectively. Several representative federated distillation methods are compared, including FedMD 13 , FedED 19 , DS-FL 20 , FKD 34 , and PLS 26 . Among them, FedMD, FedED, and DS-FL rely on a proxy dataset to transfer knowledge, while FKD and PLS are data-free KD approaches that share class-wise average predictions among users.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the experiments, ten clients participate in the distillation process, and we evaluate the model’s performance under two non-IID distribution settings: a strong non-IID setting and a weak non-IID setting, where each client has one unique class and two classes, respectively. Several representative federated distillation methods are compared, including FedMD 13 , FedED 19 , DS-FL 20 , FKD 34 , and PLS 26 . Among them, FedMD, FedED, and DS-FL rely on a proxy dataset to transfer knowledge, while FKD and PLS are data-free KD approaches that share class-wise average predictions among users.…”
Section: Resultsmentioning
confidence: 99%
“…Nevertheless, without a well-trained teacher, FD relies on the ensemble of local predictors for distillation, making it sensitive to the training state of local models, which may suffer from poor quality and underfitting. Besides, the non-identically independently distributed (non-IID) data distributions 24 , 25 across clients exacerbate this issue, since the local models cannot output accurate predictions on the proxy samples that are outside their local distributions 26 . To address the negative impact of misleading knowledge, an alternative is to incorporate soft labels (i.e., normalized logits) 17 during knowledge distillation to enhance the generalization performance.…”
Section: Introductionmentioning
confidence: 99%
“…In the experiments, ten clients participate in the distillation process, and we evaluate the model's performance under two non-IID distribution settings across clients: a strong non-IID setting and a weak non-IID setting, where each client has one unique class and two classes, respectively. Several representative federated distillation methods are compared, including FedMD 12 , FedED 17 , DS-FL 18 , FKD 31 , and PLS 23 . Among them, FedMD, FedED, and DS-FL rely on a proxy dataset to transfer knowledge, while FKD and PLS are data-free KD approaches that share class-wise average predictions among users.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…Despite the potential for improving efficiency and privacy, FD is sensitive to the training state of local models due to the lack of a well-trained teacher, where the ensemble predictions may have low quality due to the under-fitted local predictors. Besides, the non-identically independently distributed (non-IID) data distributions 21,22 across clients exacerbate this issue, since the local models cannot output accurate predictions on the proxy samples that are outside their local distributions 23 .…”
mentioning
confidence: 99%
“…(2) KD-based FL needs a dataset for distillation, which can be client private data [59], publicly available data [10], or artificially generated synthetic data [181]. (3) Typically, KD-based FL lacks a pre-trained teacher model [146], and the initial training performance of the teacher model is suboptimal. However, the teacher model gradually improves reliability and convergence as the training progresses.…”
Section: Kd-based Flmentioning
confidence: 99%