2022
DOI: 10.48550/arxiv.2202.02929
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Model-Based Offline Meta-Reinforcement Learning with Regularization

Abstract: Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balanc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
(15 reference statements)
0
0
0
Order By: Relevance
“…Going beyond methods that only perform fine-tuning from a learned initialization with online interaction [40,25,31], we consider two independent fine-tuning settings: (1) the setting where we do not use any online interaction and fine-tune the pre-trained policy entirely offline, (2) the setting where a limited amount of online interaction is allowed to autonomously acquire the skills to solve the task from a challenging initial condition. This resembles the problem setting considered by offline meta-RL methods [33,8,39,45,34]. However, our approach is simpler as we fine-tune the very same offline RL algorithm that we use for pre-training.…”
Section: Introductionmentioning
confidence: 99%
“…Going beyond methods that only perform fine-tuning from a learned initialization with online interaction [40,25,31], we consider two independent fine-tuning settings: (1) the setting where we do not use any online interaction and fine-tune the pre-trained policy entirely offline, (2) the setting where a limited amount of online interaction is allowed to autonomously acquire the skills to solve the task from a challenging initial condition. This resembles the problem setting considered by offline meta-RL methods [33,8,39,45,34]. However, our approach is simpler as we fine-tune the very same offline RL algorithm that we use for pre-training.…”
Section: Introductionmentioning
confidence: 99%