The Steep Road to Happily Ever after: an Analysis of Current Visual Storytelling Models

Modi, Yatri; Parde, Natalie

doi:10.18653/v1/w19-1805

Cited by 9 publications

(6 citation statements)

References 20 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1 shows the performances of different models on seven automatic evaluation metrics. Some works (Wang et al 2018a;Modi and Parde 2019) have confirm that CIDEr do not correlate well with human evaluations in this task, but here we still adopt this metric for reference. Overall, the results indicate that our proposed SGVST model achieves superior performances over other state-of-the-art models optimized with MLE and RL, which directly demonstrates our graph-based model can help for story generation.…”

Section: Quantitative Resultsmentioning

confidence: 96%

Storytelling from an Image Stream Using Scene Graphs

Wang

Wei

et al. 2020

AAAI

View full text Add to dashboard Cite

Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual storytelling by modeling the two-level relationships on scene graphs. In particular, on the within-image level, we employ a Graph Convolution Network (GCN) to enrich local fine-grained region representations of objects on scene graphs. To further model the interaction among images, on the cross-images level, a Temporal Convolution Network (TCN) is utilized to refine the region representations along the temporal dimension. Then the relation-aware representations are fed into the Gated Recurrent Unit (GRU) with attention mechanism for story generation. Experiments are conducted on the public visual storytelling dataset. Automatic and human evaluation results indicate that our method achieves state-of-the-art.

show abstract

Section: Quantitative Resultsmentioning

confidence: 96%

Storytelling from an Image Stream Using Scene Graphs

Wang

Wei

et al. 2020

AAAI

View full text Add to dashboard Cite

show abstract

“…Non-Classic Automatic Evaluation: BLEURT, voc-d, and MLTD Many VIST studies have shown that classic automatic evaluation scores like BLEU and METEOR correlate poorly with human judgment (Hsu et al, 2020;Hu et al, 2019;Wang et al, 2020;Hsu et al, 2019;Wang et al, 2018a;Modi and Parde, 2019). These n-gram matching metrics fail to account for the semantic similarity to the reference stories and lexical richness in the generated stories.…”

Section: Evaluation Methodsmentioning

confidence: 99%

Plot and Rework: Modeling Storylines for Visual Storytelling

Hsu¹,

Chu²,

Huang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via a re-evaluating training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

show abstract

“…Multimodal learning has recently gained attention due to the poor performance of existing (unimodal) models on multimodal tasks (Lippe et al, 2020), with most recent solutions (context aware) employing neural architectures such as CNNs, RNNs, and Transformer-based attention models like BERT (Afridi et al, 2020;Modi and Parde, 2019;Parde, 2020). Although existing work on hate speech detection has largely relied on textbased features, this has gradually started to shift with the introduction of multimodal datasets (Lippe et al, 2020).…”

Section: Multimodal Classification Of Hateful Memesmentioning

confidence: 99%

Exploring Contrastive Learning for Multimodal Detection of Misogynistic Memes

Cuervo¹,

Parde²

2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Self Cite

View full text Add to dashboard Cite

Misogynistic memes are rampant on social media, and often convey their messages using multimodal signals (e.g., images paired with derogatory text or captions). However, to date very few multimodal systems have been leveraged for the detection of misogynistic memes. Recently, researchers have turned to contrastive learning solutions for a variety of problems. Most notably, OpenAI's CLIP model has served as an innovative solution for a variety of multimodal tasks. In this work, we experiment with contrastive learning to address the detection of misogynistic memes within the context of SemEval-2022 Task 5. Although our model does not achieve top results, these experiments provide important exploratory findings for this task. We conduct a detailed error analysis, revealing promising clues and offering a foundation for follow-up work.

show abstract

The Steep Road to Happily Ever after: an Analysis of Current Visual Storytelling Models

Cited by 9 publications

References 20 publications

Storytelling from an Image Stream Using Scene Graphs

Storytelling from an Image Stream Using Scene Graphs

Plot and Rework: Modeling Storylines for Visual Storytelling

Exploring Contrastive Learning for Multimodal Detection of Misogynistic Memes

Contact Info

Product

Resources

About