Using Natural Language for Reward Shaping in Reinforcement Learning

Goyal, Prasoon; Niekum, Scott; Mooney, Raymond J.

doi:10.24963/ijcai.2019/331

Cited by 50 publications

(59 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As illustrated in Figure 1, we separate the literature into language-conditional RL (in which interaction with language is necessitated by the problem formulation itself) and language-assisted RL (in which language is used to facilitate learning). The two categories are not mutually exclusive, in that for some languageconditional RL tasks, NLP methods or additional textual corpora are used to assist learning Goyal et al, 2019].…”

Section: Current Use Of Natural Language In Rlmentioning

confidence: 99%

“…Methods developed for languageconditional tasks are relevant for language-assisted RL as they both deal with the problem of grounding natural language sentences in the context of RL. Moreover, in tasks such as following sequences of instructions, the full instructions are often not necessary to solve the underlying RL problem but they assist learning by structuring the policy [Andreas et al, 2017] or by providing auxiliary rewards [Goyal et al, 2019].…”

Section: Language-conditional Rlmentioning

confidence: 99%

“…More recently, with the developments in deep learning, a common approach has been to embed both the instruction and observation to condition the policy directly [Mei et al, 2016;Hermann et al, 2017;Chaplot et al, 2018;Janner et al, 2018;Misra et al, 2017;. Human-generated natural language instructions are used in [MacMahon et al, 2006;Bisk et al, 2016;Misra et al, 2017;Janner et al, 2018;Anderson et al, 2018;Goyal et al, 2019;. Due to data-efficiency limitations of RL, this is not a standard in RL-based research [Hermann et al, 2017].…”

Section: Instruction Followingmentioning

confidence: 99%

“…When environment rewards are available but sparse, instructions may still be used to generate auxiliary rewards to help learn efficiently. In this setting, [Goyal et al, 2019] and use auxiliary reward-learning modules trained offline to predict whether trajectory segments correspond to natural language annotations of expert trajectories. [Agarwal et al, 2019] perform a meta-optimisation to learn auxiliary rewards conditioned on features extracted from instructions.…”

Section: Rewards From Instructionsmentioning

confidence: 99%

“…Textual information can assist learning by specifying informative features, annotating states or entities in the environment, or describing subtasks in a multitask setting. In most cases covered here, the textual information is task-specific, with a few cases of using task-independent information through language parsers [Branavan et al, 2012] and pre-trained sentence embeddings [Goyal et al, 2019].…”

Section: Language-assisted Rlmentioning

confidence: 99%

See 4 more Smart Citations

A Survey of Reinforcement Learning Informed by Natural Language

Luketina

Nardelli

Farquhar

et al. 2019

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

119

100

View full text Add to dashboard Cite

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making problems. We thus argue that the time is right to investigate a tight integration of natural language understanding into RL in particular. We survey the state of the field, including work on instruction following, text games, and learning from textual domain knowledge. Finally, we call for the development of new environments as well as further investigation into the potential uses of recent Natural Language Processing (NLP) techniques for such tasks.

show abstract

Section: Current Use Of Natural Language In Rlmentioning

confidence: 99%

Section: Language-conditional Rlmentioning

confidence: 99%

Section: Instruction Followingmentioning

confidence: 99%

Section: Rewards From Instructionsmentioning

confidence: 99%

Section: Language-assisted Rlmentioning

confidence: 99%

See 3 more Smart Citations