2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00794
|View full text |Cite
|
Sign up to set email alerts
|

Show Me a Story: Towards Coherent Neural Story Illustration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(28 citation statements)
references
References 21 publications
0
28
0
Order By: Relevance
“…• CNSI [12]: global visual semantic matching model which utilizes hand-crafted coherence feature as encoder. • No Context [11]: the state-of-the-art dense visual semantic matching model for text-to-image retrieval.…”
Section: Quantitative Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…• CNSI [12]: global visual semantic matching model which utilizes hand-crafted coherence feature as encoder. • No Context [11]: the state-of-the-art dense visual semantic matching model for text-to-image retrieval.…”
Section: Quantitative Resultsmentioning
confidence: 99%
“…Table 1 presents the story-to-image retrieval performance of the four models on VIST testing set. The "No Context" model has achieved significant improvements over the previous CNSI [12] method, which is mainly contributed to the dense visual semantic matching with bottom-up region features instead of global matching. The CADM model without attention can boost the performance of "No Context" model with fixed context, which demonstrates the importance of contextual information for the story-to-image retrieval.…”
Section: Quantitative Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite being unsuitable for our inverse problem, VIST has also been used for retrieving images when given text, in work related to ours. In an approach called Coherent Neural Story Illustration (CNSI), an encoder-decoder network [27] was built to first encode sentences using a hierarchical two-level sentence-story gated recurrent unit (GRU), and then sequentially decode into a corresponding sequence of illustrative images. A previously proposed coherence model [24] was used to explicitly model co-references between sentences.…”
Section: Storymentioning
confidence: 99%
“…With a rapid growth of multimedia data [23,30], understanding the visual content and interpreting it in natural language have been important yet challenging tasks, which could benefit a wide range of real-world applications, such as story telling [16,36,45], poetry creation [26,27,50,51] and support of the disabled. While deep learning techniques have made remarkable progress in describing visual content via image captioning [10,25,39,55], the obtained results are generally sentence-level, with fewer than twenty words.…”
Section: Introductionmentioning
confidence: 99%