Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1155
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration

Abstract: To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(17 citation statements)
references
References 31 publications
(21 reference statements)
0
15
0
Order By: Relevance
“…Originated in the robotics community, learning from demonstration (LfD) (Thomaz and Cakmak, 2009;Argall et al, 2009) enables robots to learn a mapping from world states to robots' manipulations based on human's demonstration of desired robot behaviors. More recent work has also explored the use of natural language and dialogue together with demonstration to teach robots new actions (Mohan and Laird, 2014;Scheutz et al, 2017;Liu et al, 2016;She and Chai, 2017;Chai et al, 2018;Gluck and Laird, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Originated in the robotics community, learning from demonstration (LfD) (Thomaz and Cakmak, 2009;Argall et al, 2009) enables robots to learn a mapping from world states to robots' manipulations based on human's demonstration of desired robot behaviors. More recent work has also explored the use of natural language and dialogue together with demonstration to teach robots new actions (Mohan and Laird, 2014;Scheutz et al, 2017;Liu et al, 2016;She and Chai, 2017;Chai et al, 2018;Gluck and Laird, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Procedural text understanding and knowledge extraction (Chu et al, 2017;Park and Motahari Nezhad, 2018;Kiddon et al, 2015;Jermsurawong and Habash, 2015;Liu et al, 2016;Long et al, 2016;Maeta et al, 2015;Malmaud et al, 2014;Artzi and Zettlemoyer, 2013;Kuehne et al, 2017) (Kiddon et al, 2015;Jermsurawong and Habash, 2015), our approach differs as we extract knowledge from the visual signals and transcripts directly, not from imperative recipe texts. Instructional video understanding.…”
Section: Related Workmentioning
confidence: 99%
“…While several methods have been developed for learning the grounding of instructions into logical forms for a robot to carry out a plan [2,3], these do not allow the flexibility required for the type of interaction in (1) and rely on explicit verb forms which are directly grounded in a corresponding action. Even if statistical NLU methods allow for some flexibility in the form, these still only permit a command-and-control Human-Robot Interaction (HRI) with long waiting times and no ability to adjust plans on the fly.…”
Section: Amentioning
confidence: 99%
“…Firstly, we follow [3] in showing how a hierarchical structure can capture simple robotic tasks in a useful way for NLU. Fig.…”
Section: Hri Intentions As Adjustable Hierarchical Action Graphsmentioning
confidence: 99%