Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1606
|View full text |Cite
|
Sign up to set email alerts
|

Storyboarding of Recipes: Grounded Contextual Generation

Abstract: Information need of humans is essentially multimodal in nature, enabling maximum exploitation of situated context. We introduce a dataset for sequential procedural (how-to) text generation from images in cooking domain. The dataset consists of 16,441 cooking recipes with 160,479 photos associated with different steps. We setup a baseline motivated by the best performing model in terms of human evaluation for the Visual Story Telling (ViST) task. In addition, we introduce two models to incorporate high level st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 30 publications
(44 citation statements)
references
References 25 publications
0
39
0
Order By: Relevance
“…• Non-textual Modality: Static grounding here includes using adversarial references to ground visual referring expressions (Akula et al, 2020), narration (Chandu et al, 2019b(Chandu et al, , 2020a, language learning (Suglia et al, 2020;Jin et al, 2020) etc.,…”
Section: Approaches To Groundingmentioning
confidence: 99%
“…• Non-textual Modality: Static grounding here includes using adversarial references to ground visual referring expressions (Akula et al, 2020), narration (Chandu et al, 2019b(Chandu et al, , 2020a, language learning (Suglia et al, 2020;Jin et al, 2020) etc.,…”
Section: Approaches To Groundingmentioning
confidence: 99%
“…In [5] they release a dataset of sequenced image-text pairs in the cooking domain, with focus on text generation conditioned on images. RecipeQA [34] is another popular dataset, used for multimodal comprehension and reasoning, with 36K questions about the 20K recipes and illustrative images for each step of the recipes.…”
Section: Storymentioning
confidence: 99%
“…In addition to static images, Gella et al (2018) have also collected a dataset of describing stories from videos uploaded on social media. Chandu et al (2019) recently introduced a dataset for generating textual cooking recipes from a sequence of images and proposed two models to incorporate structure in procedural text generation from images.…”
Section: Related Workmentioning
confidence: 99%