Storytelling from an Image Stream Using Scene Graphs

Wang, Ruize; Wei, Zhongyu; Li, Piji; Zhang, Qi; Huang, Xuanjing

doi:10.1609/aaai.v34i05.6455

Cited by 51 publications

(44 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Non-Classic Automatic Evaluation: BLEURT, voc-d, and MLTD Many VIST studies have shown that classic automatic evaluation scores like BLEU and METEOR correlate poorly with human judgment (Hsu et al, 2020;Hu et al, 2019;Wang et al, 2020;Hsu et al, 2019;Wang et al, 2018a;Modi and Parde, 2019). These n-gram matching metrics fail to account for the semantic similarity to the reference stories and lexical richness in the generated stories.…”

Section: Evaluation Methodsmentioning

confidence: 99%

“…Leveraging External Resources for VIST Another set of work leverages external resources and knowledge to enrich the generated visual stories. For example, apply Concept-Net (Liu and Singh, 2004) and self-attention for create commonsense-augmented image features; Wang et al (2020) use graph convolution networks on scene graphs (Johnson et al, 2018) to associate objects across images; and KG-Story (Hsu et al, 2020) is a three-stage VIST framework that uses Visual Genome (Krishna et al, 2017) to produce knowledge-enriched visual stories.…”

Section: Related Workmentioning

confidence: 99%

“…As for linking elements, most works generate visual stories in an end-to-end fashion (Huang et al, 2016;Kim et al, 2018), treating the task as a straightforward extension of image captioning. Recent works have begun to use relations between entities to improve visual storytelling, but often narrow in a particular subset of relations, such as relations between elements within the same image , relations between two adjacent images (Hsu et al, 2020), or relations between scenes (Wang et al, 2020). The full potential of rich real-world knowledge and intra-image relations have yet to be fully utilized.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Plot and Rework: Modeling Storylines for Visual Storytelling

Hsu¹,

Chu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via a re-evaluating training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

show abstract

Section: Evaluation Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Plot and Rework: Modeling Storylines for Visual Storytelling

Hsu¹,

Chu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…More recently, Wang et al (2018a) and Wang et al (2018b) propose to utilize reinforcement learning frameworks for this task. Wang et al (2020) propose to translate images to graph-based semantic representations to benefit representing images, while it is not fair to compare with our method because they introduce information from other datasets. The most similar work to ours is from Huang et al (2019).…”

Section: Case Studymentioning

confidence: 99%

“…It requires the model to understand the main idea of an image stream and generate coherent sentences. Most of existing methods (Huang et al, 2016;Yu et al, 2017a;Wang et al, 2018a;Wang et al, 2020) for visual storytelling extend approaches of image captioning without considering topic information of the image sequence, which causes the problem of generating semantically incoherent content.…”

Section: Introductionmentioning

confidence: 99%

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Wang

Wei

Cheng

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.

show abstract