“…Recently, a number of proposals utilize reinforcement learning in mobile health or two-sided markets (Ertefaie, 2014;Luckett et al, 2019;Chen et al, 2020;Hu et al, 2019;Liao et al, 2020;Wang et al, 2021;Zhou et al, 2021;Li et al, 2022a,b;Liao et al, 2022;Shi et al, 2022a,b). In addition, there is a growing literature on adapting reinforcement learning to develop dynamic treatment regimes in precision medicine, to recommend treatment decisions based on individual patients' information (Murphy, 2003;Chakraborty et al, 2010;Qian and Murphy, 2011;Zhao et al, 2012;Zhang et al, 2013;Song et al, 2015;Zhao et al, 2015;Zhang et al, 2015Zhang et al, , 2018Zhu et al, 2017;Wang et al, 2018;Shi et al, 2018a,b;Mo et al, 2020;Meng et al, 2020;Cai et al, 2021;Fang et al, 2021). All these methods considered a single-agent setup where only one agent exists in the environment.…”