2020
DOI: 10.48550/arxiv.2010.02903
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Abstract: Text-based games present a unique challenge for autonomous agents to operate in natural language and handle enormous action spaces. In this paper, we propose the Contextual Action Language Model (CALM) to generate a compact set of action candidates at each game state. Our key insight is to train language models on human gameplay, where people demonstrate linguistic priors and a general game sense for promising actions conditioned on game history. We combine CALM with a reinforcement learning agent which re-ran… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…Narasimhan et al (2015) demonstrate that "Deep-Q Networks" (DQN) (Mnih et al, 2015) et al (2020) show that the Go-Explore algorithm (Ecoffet et al, 2019), which periodically returns to promising but underexplored areas of a world, can achieve higher scores than the DRRN with fewer steps. (Fulda et al, 2017a) 0.59 0.03 0.00 0.10 0.00 0.01 Golovin (Kostka et al, 2017) 0.20 0.04 0.10 0.15 0.00 0.01 AE-DQN (Zahavy et al, 2018) -0.05 ----NeuroAgent (Rajalingam and Samothrakis, 2019) 0.19 0.03 0.00 0.20 0.00 0.00 NAIL (Hausknecht et al, 2019) 0.38 0.03 0.26 -0.00 0.00 CNN-DQN (Yin and May, 2019a) -0.11 ----IK-OMP (Tessler et al, 2019) -1.00 ----TDQN 0.47 0.03 0.00 0.34 0.02 0.00 KG-A2C 0.58 0.10 0.01 0.06 0.03 0.01 SC (Jain et al, 2020) -0.10 --0.0 -CALM (N-gram) (Yao et al, 2020) 0.79 0.07 0.00 0.09 0.00 0.00 CALM (GPT-2) (Yao et al, 2020) 0.80 0.09 0.07 0.14 0.05 0.01 RC-DQN (Guo et al, 2020a) 0.81 0.11 0.40 0.20 0.05 0.02 MPRC-DQN (Guo et al, 2020a) 0.88 0.11 0.52 0.20 0.05 0.02 SHA-KG (Xu et al, 2020) 0.86 0.10 0.10 -0.05 0.02 MC!Q*BERT (Ammanabrolu et al, 2020b) 0.92 0.12 --0.00 -INV-DY (Yao et al, 2021) 0.81 0.12 0.06 0.11 0.05 - To support these modelling paradigms, Zelinka et al (2019) introduce TextWorld KG, a dataset for learning the subtask of updating knowledge graphs based on text world descriptions in a cooking domain, and show their best ensemble model is able to achieve 70 F1 at this subtask. Similarly, Annamabrolu et al (2021a) introduce JerichoWorld, a similar dataset for world modeling using knowledge graphs but on a broader set of interactive fiction games, and subsequently introduce World-Former (Ammanabrolu and Riedl, 2021b), a multitask transformer model that performs well at both knowledge-graph prediction and next-action prediction tasks.…”
Section: Text World Agentsmentioning
confidence: 99%
See 1 more Smart Citation
“…Narasimhan et al (2015) demonstrate that "Deep-Q Networks" (DQN) (Mnih et al, 2015) et al (2020) show that the Go-Explore algorithm (Ecoffet et al, 2019), which periodically returns to promising but underexplored areas of a world, can achieve higher scores than the DRRN with fewer steps. (Fulda et al, 2017a) 0.59 0.03 0.00 0.10 0.00 0.01 Golovin (Kostka et al, 2017) 0.20 0.04 0.10 0.15 0.00 0.01 AE-DQN (Zahavy et al, 2018) -0.05 ----NeuroAgent (Rajalingam and Samothrakis, 2019) 0.19 0.03 0.00 0.20 0.00 0.00 NAIL (Hausknecht et al, 2019) 0.38 0.03 0.26 -0.00 0.00 CNN-DQN (Yin and May, 2019a) -0.11 ----IK-OMP (Tessler et al, 2019) -1.00 ----TDQN 0.47 0.03 0.00 0.34 0.02 0.00 KG-A2C 0.58 0.10 0.01 0.06 0.03 0.01 SC (Jain et al, 2020) -0.10 --0.0 -CALM (N-gram) (Yao et al, 2020) 0.79 0.07 0.00 0.09 0.00 0.00 CALM (GPT-2) (Yao et al, 2020) 0.80 0.09 0.07 0.14 0.05 0.01 RC-DQN (Guo et al, 2020a) 0.81 0.11 0.40 0.20 0.05 0.02 MPRC-DQN (Guo et al, 2020a) 0.88 0.11 0.52 0.20 0.05 0.02 SHA-KG (Xu et al, 2020) 0.86 0.10 0.10 -0.05 0.02 MC!Q*BERT (Ammanabrolu et al, 2020b) 0.92 0.12 --0.00 -INV-DY (Yao et al, 2021) 0.81 0.12 0.06 0.11 0.05 - To support these modelling paradigms, Zelinka et al (2019) introduce TextWorld KG, a dataset for learning the subtask of updating knowledge graphs based on text world descriptions in a cooking domain, and show their best ensemble model is able to achieve 70 F1 at this subtask. Similarly, Annamabrolu et al (2021a) introduce JerichoWorld, a similar dataset for world modeling using knowledge graphs but on a broader set of interactive fiction games, and subsequently introduce World-Former (Ammanabrolu and Riedl, 2021b), a multitask transformer model that performs well at both knowledge-graph prediction and next-action prediction tasks.…”
Section: Text World Agentsmentioning
confidence: 99%
“…Some IF games are very linear, having a clear progression from start to finish (e.g., Acorn Court, Detective; others have huge maps that an agent has to explore before it can progress in the quest (e.g., Zork, Hitchhiker's Guide to the Galaxy). Exploration heuristics are a part of some successful methods for playing IF with RL (Yao et al, 2020). ScienceWorld (Wang et al, 2022) has an underlying physical engine allowing for a combinatorial explosion of possibilities like making new objects, combining existing objects, changing states of matter, etc.…”
Section: A23 Attributementioning
confidence: 99%
“…Episodic memory stores experience from earlier decision cycles. This can consist of training input-output pairs (Rubin et al, 2021), history event flows (Weston et al, 2014;, game trajectories from previous episodes (Yao et al, 2020;Tuyls et al, 2022), or other representations of the agent's experiences. During the planning stage of a decision cycle, these episodes may be retrieved into working memory to support reasoning.…”
Section: Memorymentioning
confidence: 99%
“…If multiple actions are proposed, the evaluation sub-stage assigns a value to each. This may use heuristic rules, LLM (perplexity) values (Ahn et al, 2022), learned values (Yao et al, 2020), LLM reasoning Hao et al, 2023), or some combination. Particularly, LLM reasoning can help evaluate actions by internally simulating their grounding feedback from the external world (Hao et al, 2023;.…”
Section: Decision Makingmentioning
confidence: 99%
“…Language representations are inherently low-dimensional and flexible at different levels of abstraction (Piantadosi, Tenenbaum, & Goodman, 2016;Antonello, Turek, Vo, & Huth, 2021). Several recent studies in artificial reinforcement learning have built on this insight to demonstrate the benefits of augmenting RL agents with linguistic information (Yao, Rao, Hausknecht, & Narasimhan, 2020;Tuyls, Yao, Kakade, & Narasimhan, 2022).…”
Section: Introductionmentioning
confidence: 99%