“…For this smooth formulation, variants of decentralized stochastic gradient descent (DSGD), e.g., [4,26,52,70], admit simple implementations yet provide competitive practical performance against centralized methods in homogeneous environments like data centers. When the data distributions across the network become heterogeneous, the performance of DSGD in both practice and theory degrades significantly [15,39,57,59,68]. To address this issue, stochastic methods that are robust to heterogeneous data have been proposed, e.g., D2 [51] that is derived from primal-dual formulations [22,25,47,69] and GT-DSGD [29,63] that is based on gradient tracking [10,33,38,41,67].…”