Title Generation for User Generated Videos

Zeng, Kuo-Hao; Chen, Tseng-Hung; Niebles, Juan Carlos; Sun, Min

doi:10.1007/978-3-319-46475-6_38

Cited by 68 publications

(49 citation statements)

References 40 publications

(106 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Recently, a few video description datasets have been proposed, namely MSR-VTT (Xu et al 2016), TGIF (Li et al 2016) and VTW (Zeng et al 2016). Similar to MSVD dataset (Chen and Dolan 2011), MSR-VTT is based on YouTube clips.…”

Section: Comparison To Other Video Description Datasetsmentioning

confidence: 99%

Movie Description

et al. 2017

View full text Add to dashboard Cite

Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total).

show abstract

Section: Comparison To Other Video Description Datasetsmentioning

confidence: 99%

Movie Description

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Since labels are not required, their method can be fully unsupervised. We train their model on the four datasets they used, and also additionally expand the training with the additional dataset, called VTW [46].…”

Section: Highlightness Scorementioning

confidence: 99%

Comixify: Transform Video Into Comics

Pęśko

Svystun

Andruszkiewicz

et al. 2019

View full text Add to dashboard Cite

In this paper, we propose a solution to transform a video into a comics. We approach this task using a neural style algorithm based on Generative Adversarial Networks (GANs). Several recent works in the field of Neural Style Transfer showed that producing an image in the style of another image is feasible. In this paper, we build up on these works and extend the existing set of style transfer use cases with a working application of video comixification. To that end, we train an end-to-end solution that transforms input video into a comics in two stages. In the first stage, we propose a state-of-the-art keyframes extraction algorithm that selects a subset of frames from the video to provide the most comprehensive video context and we filter those frames using image aesthetic estimation engine. In the second stage, the style of selected keyframes is transferred into a comics. To provide the most aesthetically compelling results, we selected the most state-of-the art style transfer solution and based on that implement our own ComixGAN framework. The final contribution of our work is a Web-based working application of video comixification available at

show abstract

“…The second dataset, i.e., VTW, is originally proposed for the task of video captioning, which totally contains 18100 videos [34]. Fortunately, 2000 of them are labeled with subshot-level highlight scores that indicate the con dence of each subshot to be selected into the summary, so they are employed in this paper.…”

Section: Setupmentioning

confidence: 99%

Hierarchical Recurrent Neural Network for Video Summarization

Zhao

2017

Proceedings of the 25th ACM International Conference on Multimedia

172

View full text Add to dashboard Cite

Exploiting the temporal dependency among video frames or subshots is very important for the task of video summarization. Practically, RNN is good at temporal dependency modeling, and has achieved overwhelming performance in many video-based tasks, such as video captioning and classi cation. However, RNN is not capable enough to handle the video summarization task, since traditional RNNs, including LSTM, can only deal with short videos, while the videos in the summarization task are usually in longer duration. To address this problem, we propose a hierarchical recurrent neural network for video summarization, called H-RNN in this paper. Speci cally, it has two layers, where the rst layer is utilized to encode short video subshots cut from the original video, and the nal hidden state of each subshot is input to the second layer for calculating its con dence to be a key subshot. Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are signi cantly lessened. The results on two popular datasets, including the Combined dataset and VTW dataset, have demonstrated that the proposed H-RNN outperforms the state-of-the-arts.

show abstract

Title Generation for User Generated Videos

Cited by 68 publications

References 40 publications

Movie Description

Movie Description

Comixify: Transform Video Into Comics

Hierarchical Recurrent Neural Network for Video Summarization

Contact Info

Product

Resources

About