“…For the MuJoCo-gym environments, we only consider results that were reported with the v1 version of the respective environment up to 2019 as the earliest publication of the latest result we found for v1 (Abdolmaleki et al, 2018a) came out in December 2018, but include results that use v2 or an ambiguous version from 2019 and 2020. Over all, we considered TRPO (Schulman et al, 2015), DDPG (Lillicrap et al, 2015), Q-Prop (Gu et al, 2016), Soft Q-learning (Haarnoja et al, 2017), ACKTR (Wu et al, 2017), PPO (Schulman et al, 2017), Clipped Action Policy Gradients (Fujita & Maeda, 2018), TD3 (Fujimoto et al, 2018), STEVE (Buckman et al, 2018), SAC (Haarnoja et al, 2018) and Relative Entropy Regularized Policy Iteration (Abdolmaleki et al, 2018a) for gym-v1.…”