“…4 Several works have studied the private multiarmed bandit problem (Mishra & Thakurta, 2015;Tossou & Dimitrakakis, 2017;Sajed & Sheffet, 2019;Ren et al, 2020a;Chen et al, 2020;Zhou & Tan, 2021;Dubey, 2021), the private contextual linear bandit problem (Shariff & Sheffet, 2018;Zheng et al, 2020;Han et al, 2020;Ren et al, 2020b;Garcelon et al, 2022), and the more general private reinforcement learning (Vietri et al, 2020;Garcelon et al, 2021;Chowdhury & Zhou, 2022a) problem, in both local and centralized models of privacy. The regret gap between the two models (when the contexts are arbitrary, not stochastic (Han et al, 2021)) has shrunk using the intermediate sequential shuffle model (Tenenbaum et al, 2021;Chowdhury & Zhou, 2022b;Garcelon et al, 2022). See Section 5 for further discussion of these results for private contextual linear bandits.…”