“…While encouraging results have been achieved using Model Predictive Control (MPC) and trajectory optimization [24,10,18,9,19,26,4,75], these methods require in-depth knowledge of the environment and substantial efforts in manual parameter tuning, which makes these methods challenging to apply to complex environments. Alternatively, model-free RL can learn general policies for tasks with challenging terrain [43,90,53,63,64,77,35,46,85,36,38,84,44]. For example, Xie et al [85] introduce to use dynamics randomization to generalize RL locomotion policy in different environments, and Peng et al [64] use animal videos to provide demonstrations for imitation learning.…”