Knowledge-Enriched Visual Storytelling

Hsu, Chao-Chun; Chen, Ziyuan; Hsu, Chi-Yang; Li, Chih-Chia; Lin, Tai-Chia; Huang, Ting-Hao; Ku, Lun-Wei

doi:10.1609/aaai.v34i05.6303

Cited by 30 publications

(40 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Leveraging External Resources for VIST Another set of work leverages external resources and knowledge to enrich the generated visual stories. For example, apply Concept-Net (Liu and Singh, 2004) and self-attention for create commonsense-augmented image features; Wang et al (2020) use graph convolution networks on scene graphs (Johnson et al, 2018) to associate objects across images; and KG-Story (Hsu et al, 2020) is a three-stage VIST framework that uses Visual Genome (Krishna et al, 2017) to produce knowledge-enriched visual stories.…”

Section: Related Workmentioning

confidence: 99%

“…Terms These are story-like nouns such as events, time, and locations, which current object detection models are unable to extract. Therefore, we further use a Transformer-GRU (Hsu et al, 2020) to predict story-like terms. For each image and story pair, we use image objects as the input and the nouns in the corresponding human-written story as the ground truth.…”

Section: Story Element Extractionmentioning

confidence: 99%

“…As for linking elements, most works generate visual stories in an end-to-end fashion (Huang et al, 2016;Kim et al, 2018), treating the task as a straightforward extension of image captioning. Recent works have begun to use relations between entities to improve visual storytelling, but often narrow in a particular subset of relations, such as relations between elements within the same image , relations between two adjacent images (Hsu et al, 2020), or relations between scenes (Wang et al, 2020). The full potential of rich real-world knowledge and intra-image relations have yet to be fully utilized.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Plot and Rework: Modeling Storylines for Visual Storytelling

Hsu¹,

Chu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via a re-evaluating training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Story Element Extractionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Plot and Rework: Modeling Storylines for Visual Storytelling

Hsu¹,

Chu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…So we adapt the few typical works to fit the few-shot setting for comparison. It is noted that, though [10,12] get higher score under the standard setting, we do not compare with them since they have used extra resources such as "Pretrained BERT" and "Knowledge Graph". The descriptions for these models are as follows:…”

Section: Comparison With Sotamentioning

confidence: 99%

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Tang

et al. 2020

Proceedings of the 28th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Visual Storytelling (VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by the way humans tell a story, we propose a topic adaptive storyteller to model the ability of inter-topic generalization. In practice, we apply the gradientbased meta-learning algorithm on multi-modal seq2seq models to endow the model the ability to adapt quickly from topic to topic. Besides, We further propose a prototype encoding structure to model the ability of intra-topic derivation. Specifically, we encode and restore the few training story text to serve as a reference to guide the generation at inference time. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model on BLEU and METEOR metric. The further case study shows that the stories generated after few-shot adaptation are more relative and expressive. CCS CONCEPTS • Computing methodologies → Natural language generation; Computer vision tasks.

show abstract

“…Generating vocabulary from user's contextual data through Natural Language Generation (NLG) techniques seems an obvious venue to facilitate social interactions. Although NLG has been successfully applied in the context of task-oriented dialogs (He et al, 2017), question answering (Su et al, 2016), text summarization (See et al, 2017), and story generation from photograph sequences (Hsu et al, 2020), it is unclear how these techniques can be adapted to the specifc needs of AAC support (Tintarev et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

Automated Generation of Storytelling Vocabulary from Photographs for use in AAC

Vargas¹,

Moffatt²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Research on the application of NLP in symbolbased Augmentative and Alternative Communication (AAC) tools for improving social interaction support is scarce. We contribute a novel method for generating context-related vocabulary from photographs of personally relevant events aimed at supporting people with language impairments in recounting their past experiences. Performance was calculated with information retrieval concepts on the relevance of vocabulary generated for communicating a corpus of 9730 narrative phrases about events depicted in 1946 photographs. In comparison to a baseline generation composed of frequent English words, our method generated vocabulary with a 4.6 gain in mean average precision, regardless of the level of contextual information in the input photographs, and 6.9 for photographs in which contextual information was extracted correctly. We conclude by discussing how our fndings provide insights for system optimization and usage.

show abstract

Knowledge-Enriched Visual Storytelling

Cited by 30 publications

References 14 publications

Plot and Rework: Modeling Storylines for Visual Storytelling

Plot and Rework: Modeling Storylines for Visual Storytelling

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Automated Generation of Storytelling Vocabulary from Photographs for use in AAC

Contact Info

Product

Resources

About