Visual Goal-Step Inference using wikiHow

Yue, Yang; Panagopoulou, Artemis; Lyu, Qing; Zhang, Li; Yatskar, Mark; Callison-Burch, Chris

doi:10.18653/v1/2021.emnlp-main.165

Cited by 15 publications

(16 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Procedural Knowledge Procedural knowledge can be seen as a subset of knowledge pertaining to scripts (Abelson and Schank, 1977;Rudinger et al, 2015), schemata (Rumelhart, 1975) or events. A small body of previous work (Mujtaba and Mahapatra, 2019) on procedural events includes extracting them from instructional texts (Paris et al, 2002;Delpech and Saint-Dizier, 2008;Zhang et al, 2012) and videos (Alayrac et al, 2016;Yang et al, 2021a), reasoning about them (Takechi et al, 2003;Rajagopal et al, 2020), or showing their downstream applications (Pareti, 2018;Zhang et al, 2020d;Yang et al, 2021b;Zhang et al, 2020b;Lyu et al, 2021), specifically on intent reasoning (Sap et al, 2019;Zhang et al, 2020c). Most procedural datasets are collected by crowdsourcing then manually cleaned (Singh et al, 2002;Regneri et al, 2010;Li et al, 2012;Wanzare et al, 2016;Rashkin et al, 2018) and are hence small.…”

Section: Related Workmentioning

confidence: 99%

“…Existing works also practice similar data splits that share the labels of videos/images across the training, development and the test set. For example, image retrieval tasks use the same objects labels for training and evaluations (Wan et al, 2014); Activity Net (Heilbron et al, 2015), a popular benchmark for human activity understanding, uses the same 203 activities across different splits; Yang et al (2021b) trains a step inference model with a training set that shares the same goals with the test set.…”

Section: B Video Retrieval Setup B1 Dataset Constructionmentioning

confidence: 99%

“…In terms of the scale of the video retrieval dataset, though we only select 1000 goals from 23k goals from Howto1M, there are already 150k videos in total while widely-used video datasets like COIN (Tang et al, 2019) only contain 180 goals and 10k videos. In addition, exiting works like (Yang et al, 2021b) also experimented with a sampled dataset of similar scale.…”

Section: B2 Evaluation Metricsmentioning

confidence: 99%

See 2 more Smart Citations

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

Zhou¹,

Zhang²,

Yang³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

Procedures are inherently hierarchical. To make videos, one may need to purchase a camera, which in turn may require one to set a budget. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., purchase a camera) in an article to other articles with similar goals (e.g., how to choose a camera), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval. 1 * Equal contribution. 1 A demo with partial data can be found at https://wikihow-hierarchy.github.io/. The code and the data are at https://github.com/shuyanzhou/wikihow_hierarchy.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Video Retrieval Setup B1 Dataset Constructionmentioning

confidence: 99%

Section: B2 Evaluation Metricsmentioning

confidence: 99%

See 1 more Smart Citation

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

Zhou¹,

Zhang²,

Yang³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Another line of work focuses on the sequencebased methods, which takes event-event relations into account, and orders event structures into sequences (Chambers andJurafsky, 2008, 2009;Rudinger et al, 2015;Granroth-Wilding and Clark, 2016;Pichotta and Mooney, 2016;Modi, 2016;Weber et al, 2018Weber et al, , 2020a. Instead of representing events as structures, some work treats events as natural language steps and induces schema knowledge through story ending prediction (Mostafazadeh et al, 2016;Weber et al, 2020b;Kwon et al, 2020), machine reading comprehension (Ostermann et al, 2018(Ostermann et al, , 2019, and schema goal-step prediction (Zhang et al, 2020;Yang et al, 2021). Instead of ignoring event structures or organizing events as simple sequences, we aim to capture the multi-dimensional evolution of events, as well as the structured connections.…”

Section: Related Workmentioning

confidence: 99%

Event Schema Induction with Double Graph Autoencoders

Jin¹,

Li²,

Ji³

2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Event schema depicts the typical structure of complex events, serving as a scaffolding to effectively analyze, predict, and possibly intervene in the ongoing events. To induce event schemas from historical events, previous work uses an event-by-event scheme, ignoring the global structure of the entire schema graph. We propose a new event schema induction framework using double graph autoencoders, which captures the global dependencies among nodes in event graphs. Specifically, we first extract the event skeleton from an event graph and design a variational directed acyclic graph (DAG) autoencoder to learn its global structure. Then we further fill in the event arguments for the skeleton, and use another Graph Convolutional Network (GCN) based autoencoder to reconstruct entity-entity relations as well as to detect coreferential entities. By performing this twostage induction decomposition, the model can avoid reconstructing the entire graph in one step, allowing it to focus on learning global structures between events. Experimental results on three event graph datasets demonstrate that our method achieves state-of-the-art performance and induces high-quality event schemas with global consistency. 1

show abstract

“…Procedural language planning Learning to generate goal-guided sequential language actions is an important task for many applications, including goal-step inference [28,59,64], embodied agent[49, 20, 1], and language-aided task adaptation [14]. Previous work views procedural script learning as a structured form of commonsense knowledge [15,41,51], while more recent work strengthens its association with the changing environments for executable action planning [39,45].…”

Section: Related Workmentioning

confidence: 99%

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Lu¹,

Feng²,

Zhu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Language planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps. Such procedural reasoning ability is essential for applications such as household robots and virtual assistants. Although language planning is a basic skill set for humans in daily life, it remains a challenge for large language models (LLMs) that lack deep-level commonsense knowledge in the real world. Previous methods require either manual exemplars or annotated programs to acquire such ability from LLMs. In contrast, this paper proposes Neuro-Symbolic Causal LAnguage Planner (CLAP) that elicits procedural knowledge from the LLMs with commonsense-infused prompting. Pre-trained knowledge in LLMs is essentially an unobserved confounder that causes spurious correlations between tasks and action plans. Through the lens of a Structural Causal Model (SCM), we propose an effective strategy in CLAP to construct prompts as a causal intervention toward our SCM. Using graph sampling techniques and symbolic program executors, our strategy formalizes the structured causal prompts from commonsense knowledge bases. CLAP obtains state-of-the-art performance on WikiHow and RobotHow, achieving a relative improvement of 5.28% in human evaluations under the counterfactual setting. This indicates the superiority of CLAP in causal language planning semantically and sequentially.Preprint. Under review.

show abstract

Visual Goal-Step Inference using wikiHow

Cited by 15 publications

References 14 publications

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

Event Schema Induction with Double Graph Autoencoders

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Contact Info

Product

Resources

About