“…More recently, gradient tracking has been utilized to further enhance the convergence rate of new methods; see (Lu et al 2019;Zhang and You 2020;Koloskova, Lin, and Stich 2021;Xin, Khan, and Kar 2021b) for further discussions. Variance reduction methods that mimic updates from the SARAH (Nguyen et al 2017b) and SPIDER (Wang et al 2019) methods provide optimal gradient complexity results at the expense of large batch computations; examples include D-SPIDER-SFO (Pan, Liu, and Wang 2020), D-GET (Sun, Lu, and Hong 2020), GT-SARAH (Xin, Khan, and Kar 2022), DE-STRESS (Li, Li, and Chi 2022). To avoid the large batch requirement of these methods, the STORM (Cutkosky and Orabona 2019; Xu and Xu 2023) and Hybrid-SGD (Tran-Dinh et al 2022a) methods have also been adapted to the decentralized setting; see GT-STORM (Zhang et al 2021b) and GT-HSGD (Xin, Khan, and Kar 2021a).…”