“…Prior work has proposed a number of benchmarks for reinforcement learning, which are often either explicitly episodic (Todorov et al, 2012;Beattie et al, 2016;Chevalier-Boisvert et al, 2018), or consist of games that are implicitly episodic after the player dies or completes the game (Bellemare et al, 2013;Silver et al, 2016). In addition, RL benchmarks have been proposed in the episodic setting for studying a number of orthogonal questions, such multi-task learning (Bellemare et al, 2013;Yu et al, 2020), sequential task learning (WoÅ‚czyk et al, 2021), generalization (Cobbe et al, 2020), and multi-agent learning (Samvelyan et al, 2019;. These benchmarks differ from our own in that we propose to study the challenge of autonomy.…”