2022
DOI: 10.48550/arxiv.2205.06175
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Generalist Agent

Abstract: Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
56
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 67 publications
(91 citation statements)
references
References 65 publications
2
56
0
Order By: Relevance
“…To more objectively evaluate the performance of the policies obtained with various hyperparameters and on different data sets, a series of quantitative metrics are proposed in this paper. In general, it can be said that, according to these, recurrent networks operating on complete state histories outperform simple deep neural networks operating in a Markovian regime-which is perhaps unsurprising, given the recent successes achieved in applying general-purpose sequence-to-sequence learning methods to the imitation learning domain [19]. As our model precision comparisons rely on indirect estimates rather than empirical data, it is hard to make a direct comparison with state-of-the-art performers in throwing tasks, such as [26], but it should be noted that they use learning at a higher level-outputting parameters that describe each throw-and do not use model outputs to generate Cartesian motion plans directly as proposed here.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…To more objectively evaluate the performance of the policies obtained with various hyperparameters and on different data sets, a series of quantitative metrics are proposed in this paper. In general, it can be said that, according to these, recurrent networks operating on complete state histories outperform simple deep neural networks operating in a Markovian regime-which is perhaps unsurprising, given the recent successes achieved in applying general-purpose sequence-to-sequence learning methods to the imitation learning domain [19]. As our model precision comparisons rely on indirect estimates rather than empirical data, it is hard to make a direct comparison with state-of-the-art performers in throwing tasks, such as [26], but it should be noted that they use learning at a higher level-outputting parameters that describe each throw-and do not use model outputs to generate Cartesian motion plans directly as proposed here.…”
Section: Discussionmentioning
confidence: 99%
“…Recurrent neural networks show up in earlier scientific literature periodically, such as in predicting a time-series of robot end-effector loads in an assembly task [17] and learning latent action plans from large, uncategorized play data sets [18]. However, current state-of-the-art performance across a wide variety of sequence prediction tasks-among them being imitation learning in a robotics context-is given by combining a large, universal transformer model with embedding schemes specific to various data modalities [19]. These results strongly suggest that structuring one's approach to be compatible with generalpurpose sequence predictor algorithms is preferable for ensuring its longevity.…”
Section: Related Workmentioning
confidence: 99%
“…To more objectively evaluate the performance of the policies obtained with various hyperparameters and on different data sets, a series of quantitative metrics are proposed in this paper. In general, it can be said that, according to these, recurrent networks operating on complete state histories outperform simple deep neural networks operating in a Markovian regime -which is perhaps unsurprising, given the recent successes achieved in applying general-purpose sequence-to-sequence learning methods to the imitation learning domain [Reed et al, 2022]. As our model precision comparisons rely on indirect estimates rather than empirical data, it is hard to make a direct comparison with state-of-the-art performers in throwing tasks such as [Zeng et al, 2020] -but it should be noted that they use learning at a higher level -outputting parameters that describe each throw -and do not use model outputs to generate Cartesian motion plans directly as proposed here.…”
Section: Discussionmentioning
confidence: 99%
“…Recurrent neural networks show up in earlier scientific literature periodically, such as in predicting a time-series of robot end-effector loads in an assembly task [Scherzinger et al, 2019] and learning latent action plans from large, uncategorized play data sets [Lynch et al, 2020]. But current state-of-the-art performance across a wide variety of sequence prediction tasks -among them imitation learning in a robotics context -is given by combining a large, universal transformer model with embedding schemes specific to various data modalities [Reed et al, 2022]. These results strongly suggest that structuring one's approach to be compatible with general-purpose sequence predictor algorithms is preferable for ensuring its longevity.…”
Section: Related Workmentioning
confidence: 99%
“…Few-shot learners The primary approach today for achieving successful few-shot learning models is to pretrain them over huge relevant and diverse datasets and then fine-tune them for the new tasks (Brown et al, 2020;Reed et al, 2022). The problem with this approach is that the pretrained models become specific to the datasets they were trained on (Li et al, 2017;Nalisnick et al, 2019;Yin et al, 2020;Rajendran et al, 2020).…”
Section: Relation To Other Workmentioning
confidence: 99%