Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Liu, Cong; Ball, Philip; Rudner, Tim G. J.; Parker-Holder, Jack; Osborne, Michael A.; Teh, Yee Whye

doi:10.48550/arxiv.2206.04779

Cited by 4 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The latent model discussed above is somewhat reminiscent of the ones used in model-based RL policy training methods, e.g., recurrent state space model (RSSM) used in PlaNet (Hafner et al, 2019) and Dreamer (Hafner et al, 2020a;b), as well as similar ones in Lee et al (2020); Lu et al (2022). Such methods rely on a growing experience buffer for training, which is collected online by the target policy that is being concurrently updated (with exploration noise added); however, OPE aims to extrapolate returns from a fixed set of offline trajectories which may result in limited coverage of the state and action space.…”

Section: Recurrent State Alignmentmentioning

confidence: 99%

Variational Latent Branching Model for Off-Policy Evaluation

Gao,

Chi

et al. 2023

Preprint

View full text Add to dashboard Cite

Model-based methods have recently shown great potential for off-policy evaluation (OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (MDPs), which are used to rollout simulated trajectories and estimate the performance of policies. Model-based OPE methods face two key challenges. First, as offline trajectories are usually fixed, they tend to cover limited state and action space. Second, the performance of model-based methods can be sensitive to the initialization of their parameters. In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Specifically, VLBM leverages and extends the variational inference framework with the recurrent state alignment (RSA), which is designed to capture as much information underlying the limited training data, by smoothing out the information flow between the variational (encoding) and generative (decoding) part of VLBM. Moreover, we also introduce the branching architecture to improve the model's robustness against randomly initialized model weights. The effectiveness of the VLBM is evaluated on the deep OPE (DOPE) benchmark, from which the training trajectories are designed to result in varied coverage of the state-action space. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.

show abstract

Section: Recurrent State Alignmentmentioning

confidence: 99%

Variational Latent Branching Model for Off-Policy Evaluation

Gao,

Chi

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Exogenous noise with low-diversity and no time correlation. a) Visual offline datasets from v-d4rl benchmark (Lu et al, 2022b) without any background distractors; b) Distractor setting (Lu et al, 2022a) with a single fixed exogenous image in the background.…”

Section: Related Workmentioning

confidence: 99%

“…We provide details of each EXOGENOUS DATASETS in Appendix E, along with descriptions for the data collection process. Following ; Lu et al (2022b), we release these datasets for future use by the RL community. All experiments involve pre-training the representation, and then freezing it for use in an offline RL algorithm.…”

Section: Related Workmentioning

confidence: 99%

“…All experiments involve pre-training the representation, and then freezing it for use in an offline RL algorithm. We use TD3 + BC as the downstream RL algorithm, along with data augmentations (Kostrikov et al, 2020), a combination which has been shown to be a reasonable baseline for visual offline RL (Lu et al, 2022b). Experiment setup and implementation details are discussed in Appendix G. Baselines.…”

Section: Related Workmentioning

confidence: 99%

“…After that, Chen et al (2021a) investigate whether the auxiliary representation learning objectives that broadly used in NLP or CV domains can help for imitation across different Offline RL tasks. Lu et al (2022b) further explores the existing challenges for visual observation input with the Offline RL dataset, meanwhile providing simple modifications on several state-of-the-art Offline RL algorithms to establish a competitive baseline. Another branch of representation learning in Offline RL is theoretical side.…”

Section: Extended Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Islam¹,

Tomar²,

Lamb³

et al. 2022

Preprint

View full text Add to dashboard Cite

Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e, any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information, and introduce new offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models, which have seen a great deal of interest in the RL theory community, to learn Agent-Controller Representations for Offline-RL (ACRO). Despite being simple and requiring no reward, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines.

show abstract

Accelerating the Selection of Covalent Organic Frameworks with Automated Machine Learning

et al. 2021

View full text Add to dashboard Cite

Covalent organic frameworks (COFs) have the advantages of high thermal stability and large specific surface and have great application prospects in the fields of gas storage and catalysis. This article mainly focuses on COFs' working capacity of methane (CH 4 ). Due to the vast number of possible COF structures, it is time-consuming to use traditional calculation methods to find suitable materials, so it is important to apply appropriate machine learning (ML) algorithms to build accurate prediction models. A major obstacle for the use of ML algorithms is that the performance of an algorithm may be affected by many design decisions. Finding appropriate algorithm and model parameters is quite a challenge for nonprofessionals. In this work, we use automated machine learning (AutoML) to analyze the working capacity of CH 4 based on 403,959 COFs. We explore the relationship between 23 features such as the structure, chemical characteristics, atom types of COFs, and the working capacity. Then, the tree-based pipeline optimization tool (TPOT) in AutoML and the traditional ML methods including multiple linear regression, support vector machine, decision tree, and random forest that manually set model parameters are compared. It is found that the TPOT can not only save complex data preprocessing and model parameter tuning but also show higher performance than traditional ML models. Compared with traditional grand canonical Monte Carlo simulations, it can save a lot of time. AutoML has broken through the limitations of professionals so that researchers in nonprofessional fields can realize automatic parameter configuration for experiments to obtain highly accurate and easy-to-understand results, which is of great significance for material screening.

show abstract

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Cited by 4 publications

References 0 publications

Variational Latent Branching Model for Off-Policy Evaluation

Variational Latent Branching Model for Off-Policy Evaluation

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Accelerating the Selection of Covalent Organic Frameworks with Automated Machine Learning

Contact Info

Product

Resources

About