“…Many FL variant algorithms (Li et al, 2020;Karimireddy et al, 2020b;Wang et al, 2020;Acar et al, 2021;Luo et al, 2021;Li et al, 2021c;Chen & Chao, 2021;Collins et al, 2021) are developed to tackle the data heterogeneity problem where clients typically have different data distributions and/or various data sizes, making simple FL algorithms, like FedAvg, difficult to converge and leads to bad generalization performance (Woodworth et al, 2020;Acar et al, 2021). These algorithms may not be limited to exchanging model parameters during training, but possibly include other parameters like intermediate features (Collins et al, 2021), masks of model (Li et al, 2021a), auxiliary gradient corrections (Karimireddy et al, 2020b), third-party datasets (Lin et al, 2020;Tang et al, 2022), etc. Moreover, many FL algorithms require stateful clients to store some client state, like the control variates (Karimireddy et al, 2020b), old gradients (Acar et al, 2021), personalized models or layers (Liang et al, 2020;Chen & Chao, 2021), model masks (Li et al, 2021a) etc.…”