Proceedings of the Second Workshop on Shortcomings in Vision and Language 2019
DOI: 10.18653/v1/w19-1805
|View full text |Cite
|
Sign up to set email alerts
|

The Steep Road to Happily Ever after: an Analysis of Current Visual Storytelling Models

Abstract: Visual storytelling is an intriguing and complex task that only recently entered the research arena. In this work, we survey relevant work to date, and conduct a thorough error analysis of three very recent approaches to visual storytelling. We categorize and provide examples of common types of errors, and identify key shortcomings in current work. Finally, we make recommendations for addressing these limitations in the future.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 20 publications
(29 reference statements)
0
5
0
Order By: Relevance
“…Table 1 shows the performances of different models on seven automatic evaluation metrics. Some works (Wang et al 2018a;Modi and Parde 2019) have confirm that CIDEr do not correlate well with human evaluations in this task, but here we still adopt this metric for reference. Overall, the results indicate that our proposed SGVST model achieves superior performances over other state-of-the-art models optimized with MLE and RL, which directly demonstrates our graph-based model can help for story generation.…”
Section: Quantitative Resultsmentioning
confidence: 96%
“…Table 1 shows the performances of different models on seven automatic evaluation metrics. Some works (Wang et al 2018a;Modi and Parde 2019) have confirm that CIDEr do not correlate well with human evaluations in this task, but here we still adopt this metric for reference. Overall, the results indicate that our proposed SGVST model achieves superior performances over other state-of-the-art models optimized with MLE and RL, which directly demonstrates our graph-based model can help for story generation.…”
Section: Quantitative Resultsmentioning
confidence: 96%
“…Non-Classic Automatic Evaluation: BLEURT, voc-d, and MLTD Many VIST studies have shown that classic automatic evaluation scores like BLEU and METEOR correlate poorly with human judgment (Hsu et al, 2020;Hu et al, 2019;Wang et al, 2020;Hsu et al, 2019;Wang et al, 2018a;Modi and Parde, 2019). These n-gram matching metrics fail to account for the semantic similarity to the reference stories and lexical richness in the generated stories.…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Multimodal learning has recently gained attention due to the poor performance of existing (unimodal) models on multimodal tasks (Lippe et al, 2020), with most recent solutions (context aware) employing neural architectures such as CNNs, RNNs, and Transformer-based attention models like BERT (Afridi et al, 2020;Modi and Parde, 2019;Parde, 2020). Although existing work on hate speech detection has largely relied on textbased features, this has gradually started to shift with the introduction of multimodal datasets (Lippe et al, 2020).…”
Section: Multimodal Classification Of Hateful Memesmentioning
confidence: 99%