“…This simple paradigm suffers from performance degradation when there exists data heterogeneity [20,25]. Numerous studies have been conducted for label space heterogeneity, i.e., class distributions are imbalanced across different clients, by regularizing local update with proximal term [26], personalizing client models [2,8,37,27], utilizing shared local data [44,30,10], introducing additional proxy datasets [24,29,11], or performing data-free knowledge distillation [32] in the input space [13,42,43] or the feature space [15,48]. However, there are only limited studies addressing the heterogeneity in feature space, i.e., non-IID features.…”