“…There have been a number of recent improvements on this topic of meta-optimization in multi-task RL Kim, Yoon, Dia, Kim, Bengio, and Ahn, 2018;Rothfuss, Lee, Clavera, Asfour, and Abbeel, 2018;Flennerhag, Moreno, Lawrence, and Damianou, 2018;Nagabandi, Finn, and Levine, 2018;Mendonca, Gupta, Kralev, Abbeel, Levine, and Finn, 2019;Finn, Rajeswaran, Kakade, and Levine, 2019;Lin, Thomas, Yang, and Ma, 2020;Berseth, Zhang, Zhang, Finn, and Levine, 2021;Co-Reyes, Miao, Peng, Real, Levine, Le, Lee, and Faust, 2021b;Kirsch, Flennerhag, van Hasselt, Friesen, Oh, and Chen, 2022;Wan, Peng, and Gangwani, 2022;Melo, 2022;Nam, Sun, Pertsch, Hwang, and Lim, 2022) and multi-agent RL (Foerster et al, 2018a,1;Kim et al, 2021a;Al-Shedivat, Bansal, Burda, Sutskever, Mordatch, and Abbeel, 2017). Moreover, another group of recent approaches focuses on learning a meta-critic, which explicitly guides updates to the agent's policy rather than simply guiding its actions (Harb et al, 2020;Sung, Zhang, Xiang, Hospedales, and Yang, 2017;Xu, Cao, and Chen, 2019).…”