2022
DOI: 10.1609/aaai.v36i11.21620
|View full text |Cite
|
Sign up to set email alerts
|

Class-Wise Adaptive Self Distillation for Federated Learning on Non-IID Data (Student Abstract)

Abstract: Federated learning (FL) enables multiple clients to collaboratively train a globally generalized model while keeping local data decentralized. A key challenge in FL is to handle the heterogeneity of data distributions among clients. The local model will shift the global feature when fitting local data, which results in forgetting the global knowledge. Following the idea of knowledge distillation, the global model's prediction can be utilized to help local models preserve the global knowledge in FL. However, wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 1 publication
(1 reference statement)
0
9
0
Order By: Relevance
“…KD-based solutions can be used to handle data heterogeneity either at server side, rectifying FedAvg's global model via ensemble distillation on a proxy dataset [30,41,7] or using a data-free generator [54,53], or at client side, distilling global knowledge via on-device regularizers [52,25,17,16] or synthetically-generated data [57], directly controlling the phenomenon of client drift.…”
Section: Data-distribution-agnostic Fl Via Kdmentioning
confidence: 99%
See 2 more Smart Citations
“…KD-based solutions can be used to handle data heterogeneity either at server side, rectifying FedAvg's global model via ensemble distillation on a proxy dataset [30,41,7] or using a data-free generator [54,53], or at client side, distilling global knowledge via on-device regularizers [52,25,17,16] or synthetically-generated data [57], directly controlling the phenomenon of client drift.…”
Section: Data-distribution-agnostic Fl Via Kdmentioning
confidence: 99%
“…Inspired by the work of Lukasik et al [33], He et al further observe that, in the framework of Fig. 1, leveraging an inaccurate global model (i.e., inaccurate teacher) on specific classification classes might mislead local training [17]. To alleviate such phenomenon, a class-wise adaptive weight is proposed in FedCAD [17] to control the impact of distillation: when the global model is accurate on a certain class, local models learn more from the distilled knowledge.…”
Section: Local Distillation Of Global Knowledgementioning
confidence: 99%
See 1 more Smart Citation
“…First, the movement patterns of individual animals are often drawn from distinct distributions, which inevitably results in data heterogeneity between clients. Such data heterogeneity enlarges the inconsistency of learned features across clients, easily raising drift concerns between client updates since each client model is optimised towards its local objective instead of global optima during local training [ 13 , 14 , 15 ]. To address this issue, some existing methods [ 14 , 15 , 16 , 17 ] impose constraints on the local optimisation by exploiting a model-level regularisation term, which aims to facilitate all local models to approach consistent views.…”
Section: Introductionmentioning
confidence: 99%
“…Such data heterogeneity enlarges the inconsistency of learned features across clients, easily raising drift concerns between client updates since each client model is optimised towards its local objective instead of global optima during local training [ 13 , 14 , 15 ]. To address this issue, some existing methods [ 14 , 15 , 16 , 17 ] impose constraints on the local optimisation by exploiting a model-level regularisation term, which aims to facilitate all local models to approach consistent views. For instance, FedProx restricted local model parameters to be close to global parameters by adding a proximal term in the local training [ 16 ].…”
Section: Introductionmentioning
confidence: 99%