“…. RL methods RL was used in nine studies (9/14, 64%) with algorithms including A3C, DDPG, DQN, Dueling DQN, HER, PI 2 , PPO, and Rainbow (Chi et al, 2018a(Chi et al, , 2020Behr et al, 2019;You et al, 2019;Kweon et al, 2021;Meng et al, 2021Meng et al, , 2022Cho et al, 2022;Karstensen et al, 2022). Demonstrator data in some form (GAIL, Behavior Cloning, or HD) was used as a precursor in four of the studies (4/14, 29%) during training (LfD), in conjunction with other RL algorithms (Chi et al, 2018a;Behr et al, 2019;Kweon et al, 2021;Cho et al, 2022).…”