Storyboarding of Recipes: Grounded Contextual Generation

Chandu, Khyathi Raghavi; Nyberg, Eric; Black, Alan W.

doi:10.18653/v1/p19-1606

Cited by 30 publications

(44 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Non-textual Modality: Static grounding here includes using adversarial references to ground visual referring expressions (Akula et al, 2020), narration (Chandu et al, 2019b(Chandu et al, , 2020a, language learning (Suglia et al, 2020;Jin et al, 2020) etc.,…”

Section: Approaches To Groundingmentioning

confidence: 99%

Grounding ‘Grounding’ in NLP

Chandu

Bisk

Black

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines "grounding" as the process of establishing what mutual information is required for successful communication between two interlocutorsa definition which might implicitly capture the NLP usage but differs in intent and scope.We investigate the gap between these definitions and seek answers to the following questions: (1) What aspects of grounding are missing from NLP tasks? Here we present the dimensions of coordination, purviews and constraints.(2) How is the term "grounding" used in the current research? We study the trends in datasets, domains, and tasks introduced in recent NLP conferences. And finally, (3) How to advance our current definition to bridge the gap with Cognitive Science? We present ways to both create new tasks or repurpose existing ones to make advancements towards achieving a more complete sense of grounding.

show abstract

Section: Approaches To Groundingmentioning

confidence: 99%

Grounding ‘Grounding’ in NLP

Chandu

Bisk

Black

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [5] they release a dataset of sequenced image-text pairs in the cooking domain, with focus on text generation conditioned on images. RecipeQA [34] is another popular dataset, used for multimodal comprehension and reasoning, with 36K questions about the 20K recipes and illustrative images for each step of the recipes.…”

Section: Storymentioning

confidence: 99%

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration

Batra

Haldar

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval . Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K imagetext pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods.

show abstract

“…In addition to static images, Gella et al (2018) have also collected a dataset of describing stories from videos uploaded on social media. Chandu et al (2019) recently introduced a dataset for generating textual cooking recipes from a sequence of images and proposed two models to incorporate structure in procedural text generation from images.…”

Section: Related Workmentioning

confidence: 99%

“My Way of Telling a Story”: Persona based Grounded Story Generation

Chandu¹,

Prabhumoye²,

Salakhutdinov³

et al. 2019

Proceedings of the Second Workshop on Storytelling

Self Cite

View full text Add to dashboard Cite

Visual storytelling is the task of generating stories based on a sequence of images. Inspired by the recent works in neural generation focusing on controlling the form of text, this paper explores the idea of generating these stories in different personas. However, one of the main challenges of performing this task is the lack of a dataset of visual stories in different personas. Having said that, there are independent datasets for both visual storytelling and annotated sentences for various persona. In this paper we describe an approach to overcome this by getting labelled persona data from a different task and leveraging those annotations to perform persona based story generation. We inspect various ways of incorporating personality in both the encoder and the decoder representations to steer the generation in the target direction. To this end, we propose five models which are incremental extensions to the baseline model to perform the task at hand. In our experiments we use five different personas to guide the generation process. We find that the models based on our hypotheses perform better at capturing words while generating stories in the target persona.

show abstract

Storyboarding of Recipes: Grounded Contextual Generation

Cited by 30 publications

References 25 publications

Grounding ‘Grounding’ in NLP

Grounding ‘Grounding’ in NLP

Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration

“My Way of Telling a Story”: Persona based Grounded Story Generation

Contact Info

Product

Resources

About