Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions

Misra, Dipendra; Sung, Jaeyong; Lee, Kevin; Saxena, Ashutosh

doi:10.15607/rss.2014.x.005

Cited by 29 publications

(4 citation statements)

References 40 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Frameworks for learning from language-based communication have been previously proposed. Common approaches include: reducing the learning problem to reinforcement learning [16,18,21,37,39,55], grounding language to demonstration [6,9,34,35,36,38,45,59], or devising EM-based algorithms to parse language into logical forms [30,41]. The first approach may discard useful learning signals from language feedback and inherits the limitations of RL algorithms.…”

Section: Related Workmentioning

confidence: 99%

Interactive Learning from Activity Description

Nguyen¹,

Misra²,

Schapire³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present a novel interactive learning protocol that enables training request-fulfilling agents by verbally describing their activities. Our protocol gives rise to a new family of interactive learning algorithms that offer complementary advantages against traditional algorithms like imitation learning (IL) and reinforcement learning (RL). We develop an algorithm that practically implements this protocol and employ it to train agents in two challenging request-fulfilling problems using purely language-description feedback. Empirical results demonstrate the strengths of our algorithm: compared to RL baselines, it is more sample-efficient; compared to IL baselines, it achieves competitive success rates while not requiring feedback providers to have agent-specific expertise. We also provide theoretical guarantees of the algorithm under certain assumptions on the teacher and the environment.

show abstract

Section: Related Workmentioning

confidence: 99%

Interactive Learning from Activity Description

Nguyen¹,

Misra²,

Schapire³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This approach requires reward functions (the task completion detector) to be hand-designed for the training tasks considered. Another related approach is semantic parsing, which has also been used to convert language into an executable form that corresponds to actions within an environment (Forbes et al, 2015;Misra et al, 2014;Tellex et al, 2011). In a related task to instruction following, Das et al (2018) consider an embodied question-answering task where an agent must produce an answer to a question, where the relevant information lies within the environment.…”

Section: Related Workmentioning

confidence: 99%

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Fu,

Korattikara,

Levine

et al. 2019

Preprint

View full text Add to dashboard Cite

Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous machines, such as robots, is a significant challenge: conventionally, reward functions and goal states have been used to communicate objectives. But people can communicate objectives to each other simply by describing or demonstrating them. How can we build learning algorithms that will allow us to tell machines what we want them to do? In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than language-conditioned policies to new environments. We propose language-conditioned reward learning (LC-RL), which grounds language commands as a reward function represented by a deep neural network. We demonstrate that our model learns rewards that transfer to novel tasks and environments on realistic, high-dimensional visual environments with natural language commands, whereas directly learning a languageconditioned policy leads to poor performance.

show abstract

“…Robotics. Controlling a robot with textual commands is one of the central problems in robotics, e.g., [Guadarrama et al 2013;Misra et al 2014;Tenorth and Beetz 2013]. Rather than learning to map language to object/scene arrangements, as for Text2Scene, the key problem in robotics is to map texts to robot motions, as well as robotobject interactions.…”

Section: Related Workmentioning

confidence: 99%

Language-driven synthesis of 3D scenes from scene databases

Patil

Fisher

et al. 2018

ACM Trans. Graph.

View full text Add to dashboard Cite

We introduce a novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases. The advantage of natural language editing interfaces is strongest when performing semantic operations at the sub-scene level, acting on groups of objects. We learn how to manipulate these sub-scenes by analyzing existing 3D scenes. We perform edits by first parsing a natural language command from the user and transforming it into a semantic scene graph that is used to retrieve corresponding sub-scenes from the databases that match the command. We then augment this retrieved sub-scene by incorporating other objects that may be implied by the scene context. Finally, a new 3D scene is synthesized by aligning the augmented sub-scene with the user's current scene, where new objects are spliced into the environment, possibly triggering appropriate adjustments to the existing scene arrangement. A suggestive modeling interface with multiple interpretations of user commands is used to alleviate ambiguities in natural language. We conduct studies comparing our approach against both prior text-to-scene work and artist-made scenes and find that our method significantly outperforms prior work and is comparable to handmade scenes even when complex and varied natural sentences are used.

show abstract

Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions

Cited by 29 publications

References 40 publications

Interactive Learning from Activity Description

Interactive Learning from Activity Description

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Language-driven synthesis of 3D scenes from scene databases

Contact Info

Product

Resources

About