Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Nakamoto, Mitsuhiko; Zhai, Yuexiang; Singh, Anikait; Mark, Max Sobol; Ma, Yue; Finn, Chelsea; Kumar, Aviral; Levine, Sergey

doi:10.48550/arxiv.2303.05479

Cited by 4 publications

(6 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The combination of offline and online RL techniques has emerged as a promising research direction. In works like [2,3,8], offline RL has been used to train a policy from a pre-collected dataset of experiences that is then fine-tuned with online RL. These studies have investigated diverse strategies aimed at improving the performance gain of offline pre-training and mitigating the phenomenon known as policy collapse, which causes a performance dip when shifting from offline to online training [3].…”

Section: Combining Offline and Online Rlmentioning

confidence: 99%

See 1 more Smart Citation

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation

Macaluso,

Sestini,

Bagdanov

2024

The 2nd AAAI Workshop on Artificial Intelligence With Biased or Scarce Data (AIBSD)

View full text Add to dashboard Cite

show abstract

Section: Combining Offline and Online Rlmentioning

confidence: 99%

“…These approaches propose measures such as reducing the underestimation during offline stages [8], imposing a conservative improvement to the online stage [3], and weighting policy improvement with the advantage function [2].…”

Section: Combining Offline and Online Rlmentioning

confidence: 99%

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation

Macaluso,

Sestini,

Bagdanov

2024

The 2nd AAAI Workshop on Artificial Intelligence With Biased or Scarce Data (AIBSD)

View full text Add to dashboard Cite

show abstract

“…The data available for RL consists of a combination of on-policy samples and potentially suboptimal near-expert interventions, which necessitates using a suitable off-policy RL algorithm that can incorporate prior (near-expert) data easily but also can efficiently improve with online experience. While a variety of algorithms designed for online RL with offline data could be suitable (Song et al, 2022;Lee et al, 2022;Nakamoto et al, 2023), we adopt the recently proposed RLPD algorithm (Ball et al, 2023), which has shown compelling results on sample-efficient robotic learning. RLPD is an off-policy actor-critic reinforcement learning algorithm that builds on soft-actor critic (Haarnoja et al, 2018), but makes some key modifications to satisfy the desiderata above such as a high update-to-data ratio, layer-norm regularization during training, and using ensembles of value functions, which make it more suitable for incorporating offline data into online RL.…”

Section: Interactive Imitation Learning As Reinforcement Learningmentioning

confidence: 99%

Leverage Interactive Affinity for Affordance Learning

Luo,

Zhai,

Zhang

et al. 2023

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the nonasymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically verify the proposed theoretical justification that the performance of our method is associated with the choice of intervention model and suboptimality of the expert. Code and videos can be found on the project website: rlif-page.github.io * Equal contributions.

show abstract

“…Consequently, offline datasets alone cannot provide enough information on the safety constraints in the environment, and thus offline training is not sufficient for safe RL. It, therefore, strengthened the necessity of continuing to improve the decision-making policy by an online finetuning process with interactions in task environments [24]- [26].…”

Section: Introductionmentioning

confidence: 99%

“…Prior work [25], [26] typically uses the offline trained policy network architecture for online finetuning. Unfortunately, DT's transformer-based policy network, with its numerous parameters, can often fall short of meeting computation speed requirements in real-world tasks like autonomous driving.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Tang

Tomizuka

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Safe Reinforcement Learning (RL) aims to find a policy that achieves high rewards while satisfying cost constraints. When learning from scratch, safe RL agents tend to be overly conservative, which impedes exploration and restrains the overall performance. In many realistic tasks, e.g. autonomous driving, large-scale expert demonstration data are available. We argue that extracting expert policy from offline data to guide online exploration is a promising solution to mitigate the conserveness issue. Large-capacity models, e.g. decision transformers (DT), have been proven to be competent in offline policy learning. However, data collected in realworld scenarios rarely contain dangerous cases (e.g., collisions), which makes it prohibitive for the policies to learn safety concepts. Besides, these bulk policy networks cannot meet the computation speed requirements at inference time on realworld tasks such as autonomous driving. To this end, we propose Guided Online Distillation (GOLD), an offline-to-online safe RL framework. GOLD distills an offline DT policy into a lightweight policy network through guided online safe RL training, which outperforms both the offline DT policy and online safe RL algorithms. Experiments in both benchmark safe RL tasks and real-world driving tasks based on the Waymo Open Motion Dataset (WOMD) [1] demonstrate that GOLD can successfully distill lightweight policies and solve decisionmaking problems in challenging safety-critical scenarios.

show abstract

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Cited by 4 publications

References 29 publications

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline Pre-Training with Model-Based Augmentation

Leverage Interactive Affinity for Affordance Learning

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Contact Info

Product

Resources

About