“…There is a long line of past work on this algorithm, including convergence guarantees [Tsi94, Sze98, EDM03], results on linear function approximation for optimal stopping problems [TVR99, BRS18], and non-asymptotic rates under general norms in both the i.i.d. setting [Wai19a,Bor21] as well as the Markovian setting [CMSS21]. A class of variants of TD and Q-learning are also studied in literature, including actor-critic methods [KT00], SARSA [RN94], and methods that employ variance-reduction [SWW + 18, KPR + 21, Wai19b, KXWJ21].…”