“…Among them, gradient tracking (Qu and Li, 2017;Di Lorenzo and Scutari, 2016;Nedic et al, 2017), which applies the idea of dynamic average consensus (Zhu and Martínez, 2010) to global gradient estimation, provides a systematic approach to reduce the variance and has been successfully applied to decentralize many algorithms with faster rates of convergence (Li et al, 2020a;Sun et al, 2019). For nonconvex problems, a small sample of gradient tracking aided algorithms include GT-SAGA (Xin et al, 2021), D-GET (Sun et al, 2020), GT-SARAH (Xin et al, 2020), and DESTRESS (Li et al, 2021a). Our BEER algorithm also leverages gradient tracking to eliminate the strong bounded gradient and bounded dissimilarity assumptions.…”