Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.165
|View full text |Cite
|
Sign up to set email alerts
|

Visual Goal-Step Inference using wikiHow

Abstract: Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human act… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 14 publications
0
11
0
Order By: Relevance
“…Procedural Knowledge Procedural knowledge can be seen as a subset of knowledge pertaining to scripts (Abelson and Schank, 1977;Rudinger et al, 2015), schemata (Rumelhart, 1975) or events. A small body of previous work (Mujtaba and Mahapatra, 2019) on procedural events includes extracting them from instructional texts (Paris et al, 2002;Delpech and Saint-Dizier, 2008;Zhang et al, 2012) and videos (Alayrac et al, 2016;Yang et al, 2021a), reasoning about them (Takechi et al, 2003;Rajagopal et al, 2020), or showing their downstream applications (Pareti, 2018;Zhang et al, 2020d;Yang et al, 2021b;Zhang et al, 2020b;Lyu et al, 2021), specifically on intent reasoning (Sap et al, 2019;Zhang et al, 2020c). Most procedural datasets are collected by crowdsourcing then manually cleaned (Singh et al, 2002;Regneri et al, 2010;Li et al, 2012;Wanzare et al, 2016;Rashkin et al, 2018) and are hence small.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Procedural Knowledge Procedural knowledge can be seen as a subset of knowledge pertaining to scripts (Abelson and Schank, 1977;Rudinger et al, 2015), schemata (Rumelhart, 1975) or events. A small body of previous work (Mujtaba and Mahapatra, 2019) on procedural events includes extracting them from instructional texts (Paris et al, 2002;Delpech and Saint-Dizier, 2008;Zhang et al, 2012) and videos (Alayrac et al, 2016;Yang et al, 2021a), reasoning about them (Takechi et al, 2003;Rajagopal et al, 2020), or showing their downstream applications (Pareti, 2018;Zhang et al, 2020d;Yang et al, 2021b;Zhang et al, 2020b;Lyu et al, 2021), specifically on intent reasoning (Sap et al, 2019;Zhang et al, 2020c). Most procedural datasets are collected by crowdsourcing then manually cleaned (Singh et al, 2002;Regneri et al, 2010;Li et al, 2012;Wanzare et al, 2016;Rashkin et al, 2018) and are hence small.…”
Section: Related Workmentioning
confidence: 99%
“…Existing works also practice similar data splits that share the labels of videos/images across the training, development and the test set. For example, image retrieval tasks use the same objects labels for training and evaluations (Wan et al, 2014); Activity Net (Heilbron et al, 2015), a popular benchmark for human activity understanding, uses the same 203 activities across different splits; Yang et al (2021b) trains a step inference model with a training set that shares the same goals with the test set.…”
Section: B Video Retrieval Setup B1 Dataset Constructionmentioning
confidence: 99%
See 1 more Smart Citation
“…Another line of work focuses on the sequencebased methods, which takes event-event relations into account, and orders event structures into sequences (Chambers andJurafsky, 2008, 2009;Rudinger et al, 2015;Granroth-Wilding and Clark, 2016;Pichotta and Mooney, 2016;Modi, 2016;Weber et al, 2018Weber et al, , 2020a. Instead of representing events as structures, some work treats events as natural language steps and induces schema knowledge through story ending prediction (Mostafazadeh et al, 2016;Weber et al, 2020b;Kwon et al, 2020), machine reading comprehension (Ostermann et al, 2018(Ostermann et al, , 2019, and schema goal-step prediction (Zhang et al, 2020;Yang et al, 2021). Instead of ignoring event structures or organizing events as simple sequences, we aim to capture the multi-dimensional evolution of events, as well as the structured connections.…”
Section: Related Workmentioning
confidence: 99%
“…Procedural language planning Learning to generate goal-guided sequential language actions is an important task for many applications, including goal-step inference [28,59,64], embodied agent[49, 20, 1], and language-aided task adaptation [14]. Previous work views procedural script learning as a structured form of commonsense knowledge [15,41,51], while more recent work strengthens its association with the changing environments for executable action planning [39,45].…”
Section: Related Workmentioning
confidence: 99%