Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Yao, Shunyu; Rao, Rohan; Hausknecht, Matthew; Narasimhan, Karthik

doi:10.48550/arxiv.2010.02903

Cited by 4 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Narasimhan et al (2015) demonstrate that "Deep-Q Networks" (DQN) (Mnih et al, 2015) et al (2020) show that the Go-Explore algorithm (Ecoffet et al, 2019), which periodically returns to promising but underexplored areas of a world, can achieve higher scores than the DRRN with fewer steps. (Fulda et al, 2017a) 0.59 0.03 0.00 0.10 0.00 0.01 Golovin (Kostka et al, 2017) 0.20 0.04 0.10 0.15 0.00 0.01 AE-DQN (Zahavy et al, 2018) -0.05 ----NeuroAgent (Rajalingam and Samothrakis, 2019) 0.19 0.03 0.00 0.20 0.00 0.00 NAIL (Hausknecht et al, 2019) 0.38 0.03 0.26 -0.00 0.00 CNN-DQN (Yin and May, 2019a) -0.11 ----IK-OMP (Tessler et al, 2019) -1.00 ----TDQN 0.47 0.03 0.00 0.34 0.02 0.00 KG-A2C 0.58 0.10 0.01 0.06 0.03 0.01 SC (Jain et al, 2020) -0.10 --0.0 -CALM (N-gram) (Yao et al, 2020) 0.79 0.07 0.00 0.09 0.00 0.00 CALM (GPT-2) (Yao et al, 2020) 0.80 0.09 0.07 0.14 0.05 0.01 RC-DQN (Guo et al, 2020a) 0.81 0.11 0.40 0.20 0.05 0.02 MPRC-DQN (Guo et al, 2020a) 0.88 0.11 0.52 0.20 0.05 0.02 SHA-KG (Xu et al, 2020) 0.86 0.10 0.10 -0.05 0.02 MC!Q*BERT (Ammanabrolu et al, 2020b) 0.92 0.12 --0.00 -INV-DY (Yao et al, 2021) 0.81 0.12 0.06 0.11 0.05 - To support these modelling paradigms, Zelinka et al (2019) introduce TextWorld KG, a dataset for learning the subtask of updating knowledge graphs based on text world descriptions in a cooking domain, and show their best ensemble model is able to achieve 70 F1 at this subtask. Similarly, Annamabrolu et al (2021a) introduce JerichoWorld, a similar dataset for world modeling using knowledge graphs but on a broader set of interactive fiction games, and subsequently introduce World-Former (Ammanabrolu and Riedl, 2021b), a multitask transformer model that performs well at both knowledge-graph prediction and next-action prediction tasks.…”

Section: Text World Agentsmentioning

confidence: 99%

“…Some IF games are very linear, having a clear progression from start to finish (e.g., Acorn Court, Detective; others have huge maps that an agent has to explore before it can progress in the quest (e.g., Zork, Hitchhiker's Guide to the Galaxy). Exploration heuristics are a part of some successful methods for playing IF with RL (Yao et al, 2020). ScienceWorld (Wang et al, 2022) has an underlying physical engine allowing for a combinatorial explosion of possibilities like making new objects, combining existing objects, changing states of matter, etc.…”

Section: A23 Attributementioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

2022

View full text Add to dashboard Cite

Since the dawn of the digital age, interactive virtual environments and electronic games have played a huge role in shaping our lives. Not only are they a source of entertainment but they also teach us important life skills such as strategic planning, collaboration, and problem solving. Therefore, online gamers expect their virtual environment to be aware of their situation (e.g., position in a game) and interact with them in natural language. In this talk, I describe novel techniques to generate text in a particular style. This talk provides an approach of generating engaging naturalistic conversation responses using knowledge generated by pre-trained language models, considering their recent success in a multitude of NLP tasks. The talk will conclude with exploring whether pretrained language models can be situated in these virtual spaces and generate dialogue in a zero-shot manner.Bio: Shrimai Prabhumoye is a research scientist at Nvidia. She got her PhD degree in computer science from the Language Technologies Institute, Carnegie Mellon University. She was advised by Prof. Alan Black and Prof. Ruslan Salakhutdinov. She work on controllable text generation with focus on style, content and structure. She is also exploring the ethical considerations of controllable text generation. She co-designed the Computational Ethics for NLP course which was offered for the first time in Spring 2018 at CMU.

show abstract

Section: Text World Agentsmentioning

confidence: 99%

Section: A23 Attributementioning

confidence: 99%

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

2022

View full text Add to dashboard Cite

show abstract

“…Episodic memory stores experience from earlier decision cycles. This can consist of training input-output pairs (Rubin et al, 2021), history event flows (Weston et al, 2014;, game trajectories from previous episodes (Yao et al, 2020;Tuyls et al, 2022), or other representations of the agent's experiences. During the planning stage of a decision cycle, these episodes may be retrieved into working memory to support reasoning.…”

Section: Memorymentioning

confidence: 99%

“…If multiple actions are proposed, the evaluation sub-stage assigns a value to each. This may use heuristic rules, LLM (perplexity) values (Ahn et al, 2022), learned values (Yao et al, 2020), LLM reasoning Hao et al, 2023), or some combination. Particularly, LLM reasoning can help evaluate actions by internally simulating their grounding feedback from the external world (Hao et al, 2023;.…”

Section: Decision Makingmentioning

confidence: 99%

Show or Tell? Exploring when (and why) teaching with language outperforms demonstration

Sumers¹,

Ho²,

Hawkins³

et al. 2021

Preprint

View full text Add to dashboard Cite

People use a wide range of communicative acts, from concrete demonstrations to abstract language. What are the strengths and weaknesses of such different modalities? We present a series of real-time, multi-player experiments asking participants to teach (Boolean) concepts using either demonstrations or language. Our first experiment (N = 454) manipulated the complexity of the concept, finding that linguistic (but not demonstrative) teaching enables high-fidelity transmission of more complex concepts. Why, then, do humans use both demonstrations and language? As a form of conventionalized communication, language relies on shared context between speaker and listener, whereas demonstrations are inherently grounded in the world. We hypothesized linguistic communication would be more sensitive to perturbations of shared context than demonstrations. Our second experiment (N = 568) manipulated teachers’ ability to see the features that defined the concept. This restriction severely impaired linguistic (but not demonstrative) teaching. Our comparative approach confirms language relies on shared context to permit high bandwidth communication; in contrast, demonstrations are lower-bandwidth but more robust.

show abstract

“…Language representations are inherently low-dimensional and flexible at different levels of abstraction (Piantadosi, Tenenbaum, & Goodman, 2016;Antonello, Turek, Vo, & Huth, 2021). Several recent studies in artificial reinforcement learning have built on this insight to demonstrate the benefits of augmenting RL agents with linguistic information (Yao, Rao, Hausknecht, & Narasimhan, 2020;Tuyls, Yao, Kakade, & Narasimhan, 2022).…”

Section: Introductionmentioning

confidence: 99%

Name that state: How language affects human reinforcement learning

Radulescu¹,

Vong²,

Gureckis³

2022

Preprint

View full text Add to dashboard Cite

We describe two experiments designed to test whether the ease with which people can label features of the environment influences human reinforcement learning. The first experiment presents evidence that people are more efficient at learning to discern relevant features of a task when candidate features are easier to name. The second experiment shows that learning what action to take in a given state is easier when states have more readily nameable verbal labels, an effect that was especially pronounced in environments with more states. The interaction between CLIP, a state-of-the-art AI model trained to map images to natural language concepts, and established hu- man RL algorithms captures the key effects without the need to specify condition-specific parameters. These results suggest a possible role for language information in how humans represent the environment when learning from trial and error.

show abstract

Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Cited by 4 publications

References 11 publications

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

Show or Tell? Exploring when (and why) teaching with language outperforms demonstration

Name that state: How language affects human reinforcement learning

Contact Info

Product

Resources

About