“…Legged Locomotion: This has conventionally been accomplished using control theory [2,5,6,22,28,31,33,39,55,63,72,88] over handcrafted dynamics models. Recently, RL has been successfully used to learn such policies in simulation [21,49,56,68] and in the real world with sim2real methods [25,29,59,61,75,75,77,85]. Alternatively, a policy learnt in simulation can be adapted at test-time to work well in real environments [15,19,45,62,70,71,[89][90][91][92]95].…”