Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior

Cai, Sijia; Zuo, Wangmeng; Davis, Larry S.; Zhang, Lei

doi:10.1007/978-3-030-01264-9_12

Cited by 70 publications

(50 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our method outperforms all the baselines, including the supervised ranking-based methods [33,9]. and VESD [3]. We also implement a baseline where we train classifiers (CLA) with our hashtagged Instagram videos.…”

Section: Methodsmentioning

confidence: 98%

Less Is More: Learning Highlight Detection From Video Duration

Xiong

Kalantidis

Ghadiyaram

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos. We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal. Our key insight is that video segments from shorter user-generated videos are more likely to be highlights than those from longer videos, since users tend to be more selective about the content when capturing shorter videos. Leveraging this insight, we introduce a novel ranking framework that prefers segments from shorter videos, while properly accounting for the inherent noise in the (unlabeled) training data. We use it to train a highlight detector with 10M hashtagged Instagram videos. In experiments on two challenging public video highlight detection benchmarks, our method substantially improves the state-of-the-art for unsupervised highlight detection.

show abstract

Section: Methodsmentioning

confidence: 98%

Less Is More: Learning Highlight Detection From Video Duration

Xiong

Kalantidis

Ghadiyaram

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Our goal is analogous to the existing works on keyframe extraction from videos. The problem of keyframe extraction has been extensively studied in the context of video summarization [DKD98, CSJ15, ZCSG16, KVGUH18, DM18, CZDZ18] and scene change detection [MJC95, Sha95, HK17]. The goal of video summarization, however, is to find a compact set of images that can well represent as much video content as possible.…”

Section: Related Workmentioning

confidence: 99%

“…To some extent, our goal is analogous to the works of keyframe extraction from videos [DKD98, CSJ15, ZCSG16, KVGUH18, DM18, CZDZ18, MJC95, Sha95, HK17]. A key difference is that we need to extract the “keyframes” based on the path of walking, instead of video content.…”

Section: Introductionmentioning

confidence: 99%

Automatic Image Checkpoint Selection for Guider‐Follower Pedestrian Navigation

Kwan

2021

Computer Graphics Forum

View full text Add to dashboard Cite

In recent years guider‐follower approaches show a promising solution to the challenging problem of last‐mile or indoor pedestrian navigation without micro‐maps or indoor floor plans for path planning. However, the success of such guider‐follower approaches is highly dependent on a set of manually and carefully chosen image or video checkpoints. This selection process is tedious and error‐prone. To address this issue, we first conduct a pilot study to understand how users as guiders select critical checkpoints from a video recorded while walking along a route, leading to a set of criteria for automatic checkpoint selection. By using these criteria, including visibility, stairs and clearness, we then implement this automation process. The key behind our technique is a lightweight, effective algorithm using left‐hand‐side and right‐hand‐side objects for path occlusion detection, which benefits both automatic checkpoint selection and occlusion‐aware path annotation on selected image checkpoints. Our experimental results show that our automatic checkpoint selection method works well in different navigation scenarios. The quality of automatically selected checkpoints is comparable to that of manually selected ones and higher than that of checkpoints by alternative automatic methods.

show abstract

“…Weakly supervised methods required a small number of annotations and could achieve great performance. [1] proposed a weakly supervised method that only required the topic label for a video. A variational autoencoder (VAE) model was trained by the massive edited videos with topic labels on the Internet to learn a better video representation.…”

Section: Related Work 21 Video Summarizationmentioning

confidence: 99%

Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

Chen

Wang

et al. 2019

Proceedings of the ACM Multimedia Asia

View full text Add to dashboard Cite

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each frame is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches. CCS CONCEPTS• Computing methodologies → Video summarization; • Theory of computation → Reinforcement learning.

show abstract

Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior

Cited by 70 publications

References 29 publications

Less Is More: Learning Highlight Detection From Video Duration

Less Is More: Learning Highlight Detection From Video Duration

Automatic Image Checkpoint Selection for Guider‐Follower Pedestrian Navigation

Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

Contact Info

Product

Resources

About