Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 2015
DOI: 10.18653/v1/d15-1114
|View full text |Cite
|
Sign up to set email alerts
|

Mise en Place: Unsupervised Interpretation of Instructional Recipes

Abstract: We present an unsupervised hard EM approach to automatically mapping instructional recipes to action graphs, which define what actions should be performed on which objects and in what order. Recovering such structures can be challenging, due to unique properties of procedural language where, for example, verbal arguments are commonly elided when they can be inferred from context and disambiguation often requires world knowledge. Our probabilistic model incorporates aspects of procedural semantics and world kno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
87
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 78 publications
(87 citation statements)
references
References 22 publications
0
87
0
Order By: Relevance
“…We verify our approach using unstructured instructional videos readily available on YouTube [35]. By jointly optimizing on over two thousand YouTube instructional videos with no reference annotation, our joint visual-linguistic model improves 9% on both the precision and recall of reference resolution over the state-of-the-art linguistic-only model [23]. We further show that resolving reference is important to aligning unstructured speech transcriptions to videos, which are usually not perfectly aligned.…”
Section: Introductionmentioning
confidence: 76%
“…We verify our approach using unstructured instructional videos readily available on YouTube [35]. By jointly optimizing on over two thousand YouTube instructional videos with no reference annotation, our joint visual-linguistic model improves 9% on both the precision and recall of reference resolution over the state-of-the-art linguistic-only model [23]. We further show that resolving reference is important to aligning unstructured speech transcriptions to videos, which are usually not perfectly aligned.…”
Section: Introductionmentioning
confidence: 76%
“…Localizing Video Segments with Natural Language. Prior work has considered aligning natural language with video, e.g., instructional videos with transcribed text (Kiddon et al, 2015;Huang et al, 2017;Malmaud et al, 2014Malmaud et al, , 2015. Our work is most related to recent work in video moment retrieval with natural language (Gao et al, 2017;Hendricks et al, 2017).…”
Section: Related Workmentioning
confidence: 99%
“…Finally, our teachers can be seen as rewarding generators that approximate script patterns in recipes. Previous work in learning script knowledge (Schank and Abelson, 1975) has focused on extracting scripts from long texts (Chambers and Jurafsky, 2009;Pichotta and Mooney, 2016), with some of that work focusing on recipes (Kiddon et al, 2015;Mori et al, 2014Mori et al, , 2012. Our teachers implicitly learn this script knowledge and reward recipe generators for exhibiting it.…”
Section: Related Workmentioning
confidence: 99%