2020
DOI: 10.1609/aaai.v34i05.6455
|View full text |Cite
|
Sign up to set email alerts
|

Storytelling from an Image Stream Using Scene Graphs

Abstract: Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual sto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 51 publications
(44 citation statements)
references
References 24 publications
0
44
0
Order By: Relevance
“…Non-Classic Automatic Evaluation: BLEURT, voc-d, and MLTD Many VIST studies have shown that classic automatic evaluation scores like BLEU and METEOR correlate poorly with human judgment (Hsu et al, 2020;Hu et al, 2019;Wang et al, 2020;Hsu et al, 2019;Wang et al, 2018a;Modi and Parde, 2019). These n-gram matching metrics fail to account for the semantic similarity to the reference stories and lexical richness in the generated stories.…”
Section: Evaluation Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Non-Classic Automatic Evaluation: BLEURT, voc-d, and MLTD Many VIST studies have shown that classic automatic evaluation scores like BLEU and METEOR correlate poorly with human judgment (Hsu et al, 2020;Hu et al, 2019;Wang et al, 2020;Hsu et al, 2019;Wang et al, 2018a;Modi and Parde, 2019). These n-gram matching metrics fail to account for the semantic similarity to the reference stories and lexical richness in the generated stories.…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Leveraging External Resources for VIST Another set of work leverages external resources and knowledge to enrich the generated visual stories. For example, apply Concept-Net (Liu and Singh, 2004) and self-attention for create commonsense-augmented image features; Wang et al (2020) use graph convolution networks on scene graphs (Johnson et al, 2018) to associate objects across images; and KG-Story (Hsu et al, 2020) is a three-stage VIST framework that uses Visual Genome (Krishna et al, 2017) to produce knowledge-enriched visual stories.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, Wang et al (2018a) and Wang et al (2018b) propose to utilize reinforcement learning frameworks for this task. Wang et al (2020) propose to translate images to graph-based semantic representations to benefit representing images, while it is not fair to compare with our method because they introduce information from other datasets. The most similar work to ours is from Huang et al (2019).…”
Section: Case Studymentioning
confidence: 99%
“…It requires the model to understand the main idea of an image stream and generate coherent sentences. Most of existing methods (Huang et al, 2016;Yu et al, 2017a;Wang et al, 2018a;Wang et al, 2020) for visual storytelling extend approaches of image captioning without considering topic information of the image sequence, which causes the problem of generating semantically incoherent content.…”
Section: Introductionmentioning
confidence: 99%