2022
DOI: 10.48550/arxiv.2205.15241
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Multi-Game Decision Transformers

Abstract: A longstanding goal of the field of AI is a strategy for compiling diverse experience into a highly capable, generalist agent. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents. Specifically, we show that a single transformer-based model -with a single set of weights -trained pur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(28 citation statements)
references
References 34 publications
1
10
0
Order By: Relevance
“…where t is a time step and R is the return for the remaining sequence. The sequence we consider here is similar to the one used in [30] whereas we do not include reward as part of the sequence and we predict an additional quantity R that enables us to estimate an optimal input length, which we will cover in the following paragraphs. Figure 2 presents an overview of our model architecture.…”
Section: Reinforcement Learning As Sequence Modelingmentioning
confidence: 99%
See 3 more Smart Citations
“…where t is a time step and R is the return for the remaining sequence. The sequence we consider here is similar to the one used in [30] whereas we do not include reward as part of the sequence and we predict an additional quantity R that enables us to estimate an optimal input length, which we will cover in the following paragraphs. Figure 2 presents an overview of our model architecture.…”
Section: Reinforcement Learning As Sequence Modelingmentioning
confidence: 99%
“…Our training method extends the work of [30] by estimating the maximum expected return value for a trajectory using Equation 2. This estimation aids in comparing expected returns of different trajectories over various history lengths.…”
Section: Training Objective For Maximum In-support Returnmentioning
confidence: 99%
See 2 more Smart Citations
“…When learning a predictive information representation between a state, action pair and its subsequent state, the learning task is equivalent to modeling environment dynamics [4]. In this work, we are interested in training multi-task generalist agents [5,6,7] that can master a wide range of robotics skills in both simulated and real environments by learning from a large amount of diverse experience. We hypothesize that modeling the predictive information will give latent representations that capture environment dynamics across multiple tasks, making it simpler and more efficient to learn a generalist policy.…”
Section: Introductionmentioning
confidence: 99%