“…Going beyond methods that only perform fine-tuning from a learned initialization with online interaction [40,25,31], we consider two independent fine-tuning settings: (1) the setting where we do not use any online interaction and fine-tune the pre-trained policy entirely offline, (2) the setting where a limited amount of online interaction is allowed to autonomously acquire the skills to solve the task from a challenging initial condition. This resembles the problem setting considered by offline meta-RL methods [33,8,39,45,34]. However, our approach is simpler as we fine-tune the very same offline RL algorithm that we use for pre-training.…”