2021
DOI: 10.48550/arxiv.2107.02729
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Abstract: Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the env… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 32 publications
(27 reference statements)
0
15
0
Order By: Relevance
“…We evaluate LiLY on the modified cartpole (Huang et al, 2021) video dataset and compare the performances with LEAP. Modified Cartpole is a nonlinear dynamical system with cart positions x t and pole angles θ t as the true state variables.…”
Section: Causal Discovery From Videosmentioning
confidence: 99%
See 3 more Smart Citations
“…We evaluate LiLY on the modified cartpole (Huang et al, 2021) video dataset and compare the performances with LEAP. Modified Cartpole is a nonlinear dynamical system with cart positions x t and pole angles θ t as the true state variables.…”
Section: Causal Discovery From Videosmentioning
confidence: 99%
“…Modified Cartpole The Cartpole problem (Huang et al, 2021) "consists of a cart and a vertical pendulum attached to the cart using a passive pivot joint. The cart can move left or right.…”
Section: S212 Real-world Datasetmentioning
confidence: 99%
See 2 more Smart Citations
“…This can be leveraged to learn dynamics models that explicitly ignore irrelevant factors in prediction or to compute improved sample complexity bounds for policy learning [42], and seems particularly relevant for generalisation as additional structure in the context set could map onto the factored structure in the transition and reward functions. An initial example of using a similar formalism to a factored MDP in a multi-domain RL setting is [43], although it does not target the zero-shot policy transfer setting directly. We hope to see more work applying these kinds of structural assumptions to generalisation problems.…”
Section: Additional Assumptions For More Feasible Generalisationmentioning
confidence: 99%