2022
DOI: 10.48550/arxiv.2206.04779
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…The latent model discussed above is somewhat reminiscent of the ones used in model-based RL policy training methods, e.g., recurrent state space model (RSSM) used in PlaNet (Hafner et al, 2019) and Dreamer (Hafner et al, 2020a;b), as well as similar ones in Lee et al (2020); Lu et al (2022). Such methods rely on a growing experience buffer for training, which is collected online by the target policy that is being concurrently updated (with exploration noise added); however, OPE aims to extrapolate returns from a fixed set of offline trajectories which may result in limited coverage of the state and action space.…”
Section: Recurrent State Alignmentmentioning
confidence: 99%
“…The latent model discussed above is somewhat reminiscent of the ones used in model-based RL policy training methods, e.g., recurrent state space model (RSSM) used in PlaNet (Hafner et al, 2019) and Dreamer (Hafner et al, 2020a;b), as well as similar ones in Lee et al (2020); Lu et al (2022). Such methods rely on a growing experience buffer for training, which is collected online by the target policy that is being concurrently updated (with exploration noise added); however, OPE aims to extrapolate returns from a fixed set of offline trajectories which may result in limited coverage of the state and action space.…”
Section: Recurrent State Alignmentmentioning
confidence: 99%
“…Exogenous noise with low-diversity and no time correlation. a) Visual offline datasets from v-d4rl benchmark (Lu et al, 2022b) without any background distractors; b) Distractor setting (Lu et al, 2022a) with a single fixed exogenous image in the background.…”
Section: Related Workmentioning
confidence: 99%
“…We provide details of each EXOGENOUS DATASETS in Appendix E, along with descriptions for the data collection process. Following ; Lu et al (2022b), we release these datasets for future use by the RL community. All experiments involve pre-training the representation, and then freezing it for use in an offline RL algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations