Semantic Exploration from Language Abstractions and Pretrained Representations

Tam, Allison C.; Rabinowitz, Neil C.; Lampinen, Andrew K.; Roy, Nicholas; Chan, Siu‐Wai; Strouse, DJ; Wang, Jane X.; Banino, Andrea; Hill, Felix

doi:10.48550/arxiv.2204.05080

Cited by 13 publications

(9 citation statements)

References 27 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One drawback of DECKARD, along with many other LLMassisted RL methods, is that it requires an environment already be grounded in language. Some preliminary methods for generating state descriptions from images are used by Tam et al (2022), but this remains an open area of research.…”

Section: Discussionmentioning

confidence: 99%

“…In this work, we focus on using LLMs for exploration rather than directly generating action plans. Tam et al (2022) and Mu et al (2022) recently demonstrated that language is a meaningful state abstraction when used for exploration. Additionally, Tam et al (2022) ex-periment with using LLM latent representations of state descriptions for novelty exploration, relying on pretrained LLM encodings to detect novel textual states.…”

Section: Language-assisted Decision Makingmentioning

confidence: 99%

“…Tam et al (2022) and Mu et al (2022) recently demonstrated that language is a meaningful state abstraction when used for exploration. Additionally, Tam et al (2022) ex-periment with using LLM latent representations of state descriptions for novelty exploration, relying on pretrained LLM encodings to detect novel textual states. To the best of our knowledge, we are the first to apply language-assisted decision-making to exploration by using LLMs to predict and verify environment dynamics through experience.…”

Section: Language-assisted Decision Makingmentioning

confidence: 99%

See 2 more Smart Citations

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

Nottingham¹,

Ammanabrolu²,

Suhr³

et al. 2023

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world, which makes learning complex tasks with sparse rewards difficult. If initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that is tested and verified during exploration, to improve sample efficiency in embodied RL agents. Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft in two phases: (1) the Dream phase where the agent uses an LLM to decompose a task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase where the agent learns a modular policy for each subgoal and verifies or corrects the hypothesized AWM on the basis of its experiences. Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude but is also robust to and corrects errors in the LLM-successfully blending noisy internet-scale information from LLMs with knowledge grounded in environment dynamics.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Language-assisted Decision Makingmentioning

confidence: 99%

Section: Language-assisted Decision Makingmentioning

confidence: 99%

See 1 more Smart Citation

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

Nottingham¹,

Ammanabrolu²,

Suhr³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Baselines. We compare against CLIP (Radford et al, 2021), a state-of-art vision-language representation that has seen wide adoption in various robotics tasks (Shridhar et al, 2022a;Cui et al, 2022;Khandelwal et al, 2022;Tam et al, 2022); as LIV is trained using the CLIP architecture and initialized with CLIP weights, this is the closest comparison. We also compare against R3M (Nair et al, 2022b) and VIP (Ma et al, 2022b), two state-of-art pre-trained visual representations.…”

Section: Pre-trained LIV As Representationmentioning

confidence: 99%

Likelihood-Based Diverse Sampling for Trajectory Forecasting

Jason¹,

Inala

Jayaraman

et al. 2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks specified as language or image goals. We use LIV to pre-train the first control-centric vision-language representation from large human video datasets such as EpicKitchen. Given only a language or image goal, the pre-trained LIV model can assign dense rewards to each frame in videos of unseen robots or humans attempting that task in unseen environments. Further, when some target domain-specific data is available, the same objective can be used to fine-tune and improve LIV and even other pre-trained representations for robotic control and reward specification in that domain. In our experiments on several simulated and real-world robot environments, LIV models consistently outperform the best prior input state representations for imitation learning, as well as reward specification methods for policy synthesis. Our results validate the advantages of joint vision-language representation and reward learning within the unified, compact LIV framework. Project website: penn-pal-lab.github.io/LIV

show abstract

“…One cognitive function that exhibits many of the properties thought to be important for state representation is language (Tam et al, 2022). Language representations are inherently low-dimensional and flexible at different levels of abstraction (Piantadosi, Tenenbaum, & Goodman, 2016;Antonello, Turek, Vo, & Huth, 2021).…”

Section: Introductionmentioning

confidence: 99%

Name that state: How language affects human reinforcement learning

Radulescu¹,

Vong²,

Gureckis³

2022

Preprint

View full text Add to dashboard Cite

We describe two experiments designed to test whether the ease with which people can label features of the environment influences human reinforcement learning. The first experiment presents evidence that people are more efficient at learning to discern relevant features of a task when candidate features are easier to name. The second experiment shows that learning what action to take in a given state is easier when states have more readily nameable verbal labels, an effect that was especially pronounced in environments with more states. The interaction between CLIP, a state-of-the-art AI model trained to map images to natural language concepts, and established hu- man RL algorithms captures the key effects without the need to specify condition-specific parameters. These results suggest a possible role for language information in how humans represent the environment when learning from trial and error.

show abstract

Semantic Exploration from Language Abstractions and Pretrained Representations

Cited by 13 publications

References 27 publications

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

Likelihood-Based Diverse Sampling for Trajectory Forecasting

Name that state: How language affects human reinforcement learning

Contact Info

Product

Resources

About