HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

Zhao, Bin; Li, Xuelong; Lu, Xiaoqiang

doi:10.1109/cvpr.2018.00773

Cited by 182 publications

(102 citation statements)

References 21 publications

Supporting

Mentioning

100

Contrasting

Unclassified

Order By: Relevance

“…A typical result of these approaches is a sequence of keyframes or a video excerpt comprising the most important parts of a video. More recent methods treat video summarization as an optimization problem [7,10,34] or they utilize recurrent neural networks [35,36] based on, for instance, long short-term memory cells (LSTMs), which are able to capture temporal or sequential information very well. Another use case for LSTMs is proposed by Mahasseni et al [22], who suggest a generative adversarial network (GAN) consisting of an LSTM-based autoencoder and a discriminator.…”

Section: Related Workmentioning

confidence: 99%

Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction

Zhou¹,

Otto²,

Ewerth³

2019

Digital Libraries for Open Knowledge

View full text Add to dashboard Cite

Effective learning with audiovisual content depends on many factors. Besides the quality of the learning resource's content, it is essential to discover the most relevant and suitable video in order to support the learning process most effectively. Video summarization techniques facilitate this goal by providing a quick overview over the content. It is especially useful for longer recordings such as conference presentations or lectures. In this paper, we present an approach that generates a visual summary of video content based on semantic word embeddings and keyphrase extraction. For this purpose, we exploit video annotations that are automatically generated by speech recognition and video OCR (optical character recognition). We demonstrate the feasibility of the proposed approach through its incorporation into the TIB AV portal (http://av.tib.eu/), which is a platform for scientific videos. The accuracy and usefulness of the generated video content visualizations is evaluated in a user study.

show abstract

Section: Related Workmentioning

confidence: 99%

Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction

Zhou¹,

Otto²,

Ewerth³

2019

Digital Libraries for Open Knowledge

View full text Add to dashboard Cite

show abstract

“…Unsupervised video summarization methods [5], [6] usually use manually defined criteria to extract key frames or key shots. While supervised ones [7], [8] learn models with the help of human-annotated data to determine which frames or shots are more important. In this paper, we mainly focus on supervised ones.…”

Section: Introductionmentioning

confidence: 99%

Meta Learning for Task-Driven Video Summarization

Liu

Dong

2020

IEEE Trans. Ind. Electron.

Self Cite

View full text Add to dashboard Cite

Existing video summarization approaches mainly concentrate on sequential or structural characteristic of video data. However, they do not pay enough attention to the video summarization task itself. In this paper, we propose a meta learning method for performing task-driven video summarization, denoted by MetaL-TDVS, to explicitly explore the video summarization mechanism among summarizing processes on different videos. Particularly, MetaL-TDVS aims to excavate the latent mechanism for summarizing video by reformulating video summarization as a meta learning problem and promote generalization ability of the trained model. MetaL-TDVS regards summarizing each video as a single task to make better use of the experience and knowledge learned from processes of summarizing other videos to summarize new ones. Furthermore, MetaL-TDVS updates models via a two-fold back propagation which forces the model optimized on one video to obtain high accuracy on another video in every training step. Extensive experiments on benchmark datasets demonstrate the superiority and better generalization ability of MetaL-TDVS against several state-of-the-art methods.

show abstract

“…Two kinds of methods are designed to avoid browsing the whole video. The first kind is video summarization methods [32,58], which generate a short synopsis for a long video. The second kind of methods [7,8,13,14,19,22,31,37,41] try to trim the video segment of interest.…”

Section: Introductionmentioning

confidence: 99%

Spatio-Temporal Video Re-Localization by Warp LSTM

Feng

Liu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

The need for efficiently finding the video content a user wants is increasing because of the erupting of usergenerated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatiotemporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task.

show abstract

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

Cited by 182 publications

References 21 publications

Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction

Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction

Meta Learning for Task-Driven Video Summarization

Spatio-Temporal Video Re-Localization by Warp LSTM

Contact Info

Product

Resources

About