“…One idea is to judiciously evaluate a so-termed snapshot gradient ∇f (x s ), and use it as an anchor of the stochastic draws in subsequent iterations. Members of the variance reduction family include schemes abbreviated as SDCA [Shalev-Shwartz and Zhang, 2013], SVRG [Johnson and Zhang, 2013], SAG [Roux et al, 2012], SAGA [Defazio et al, 2014], MISO [Mairal, 2013], SARAH [Nguyen et al, 2017], and their variants [Konecnỳ and Richtárik, 2013, Lei et al, 2017, Li et al, 2019, Kovalev et al, 2019. Most of these algorithms rely on the update x k+1 = x k − ηv k , where η is a constant step size and v k is an algorithm-specific gradient estimate that takes advantage of the snapshot gradient.…”