2022
DOI: 10.48550/arxiv.2204.05080
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Semantic Exploration from Language Abstractions and Pretrained Representations

Abstract: Continuous first-person 3D environments pose unique exploration challenges to reinforcement learning (RL) agents, because of their high-dimensional state and action spaces. These challenges can be ameliorated by using semantically meaningful state abstractions to define novelty for exploration. We propose that learned representations shaped by natural language provide exactly this form of abstraction. In particular, we show that vision-language representations, when pretrained on image captioning datasets samp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 27 publications
(43 reference statements)
0
9
0
Order By: Relevance
“…One drawback of DECKARD, along with many other LLMassisted RL methods, is that it requires an environment already be grounded in language. Some preliminary methods for generating state descriptions from images are used by Tam et al (2022), but this remains an open area of research.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…One drawback of DECKARD, along with many other LLMassisted RL methods, is that it requires an environment already be grounded in language. Some preliminary methods for generating state descriptions from images are used by Tam et al (2022), but this remains an open area of research.…”
Section: Discussionmentioning
confidence: 99%
“…In this work, we focus on using LLMs for exploration rather than directly generating action plans. Tam et al (2022) and Mu et al (2022) recently demonstrated that language is a meaningful state abstraction when used for exploration. Additionally, Tam et al (2022) ex-periment with using LLM latent representations of state descriptions for novelty exploration, relying on pretrained LLM encodings to detect novel textual states.…”
Section: Language-assisted Decision Makingmentioning
confidence: 99%
See 1 more Smart Citation
“…Baselines. We compare against CLIP (Radford et al, 2021), a state-of-art vision-language representation that has seen wide adoption in various robotics tasks (Shridhar et al, 2022a;Cui et al, 2022;Khandelwal et al, 2022;Tam et al, 2022); as LIV is trained using the CLIP architecture and initialized with CLIP weights, this is the closest comparison. We also compare against R3M (Nair et al, 2022b) and VIP (Ma et al, 2022b), two state-of-art pre-trained visual representations.…”
Section: Pre-trained LIV As Representationmentioning
confidence: 99%
“…One cognitive function that exhibits many of the properties thought to be important for state representation is language (Tam et al, 2022). Language representations are inherently low-dimensional and flexible at different levels of abstraction (Piantadosi, Tenenbaum, & Goodman, 2016;Antonello, Turek, Vo, & Huth, 2021).…”
Section: Introductionmentioning
confidence: 99%