“…Since it is challenging in general to learn a single policy with RL to perform various tasks [28], many prior works focus on learning a single-goal policy [6,42,49,74] for legged robots, such as just forward walking at a constant speed [16,31,69]. There have been efforts to obtain more versatile policies, such as walking at different velocities using different gaits, while following different commands [17,18,35,54], which requires more extensive tuning due to the lack of a gait prior. Providing the robot with different reference motions for different goals can be helpful, but requires additional parameterization of the reference motions (e.g., a gait library) [3,24,27,37], policy distillation [70], or a motion prior [15,50,67].…”