Dilated temporal relational adversarial network for generic video summarization

Zhang, Yujia; Kampffmeyer, Michael; Liang, Xiaodan; Zhang, Dingwen; Tan, Min; Xing, Eric P.

doi:10.1007/s11042-019-08175-y

Cited by 28 publications

(39 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Supervised methods [8,10,11,19,24,31,32,38,39,40,41,42,47] learn video summarization from labeled data consisting of raw videos and their corresponding groundtruth summary videos. Supervised methods tend to outperform unsupervised ones, since they can learn useful cues from ground truth summaries that are hard to capture with hand-crafted heuristics.…”

Section: Related Workmentioning

confidence: 99%

Video Summarization by Learning From Unpaired Data

Rochan

Wang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

126

View full text Add to dashboard Cite

We consider the problem of video summarization. Given an input raw video, the goal is to select a small subset of key frames from the input video to create a shorter summary video that best describes the content of the original video. Most of the current state-of-the-art video summarization approaches use supervised learning and require labeled training data. Each training instance consists of a raw input video and its ground truth summary video curated by human annotators. However, it is very expensive and difficult to create such labeled training examples. To address this limitation, we propose a novel formulation to learn video summarization from unpaired data. We present an approach that learns to generate optimal video summaries using a set of raw videos (V ) and a set of summary videos (S), where there exists no correspondence between V and S. We argue that this type of data is much easier to collect. Our model aims to learn a mapping function F : V → S such that the distribution of resultant summary videos from F (V ) is similar to the distribution of S with the help of an adversarial objective. In addition, we enforce a diversity constraint on F (V ) to ensure that the generated video summaries are visually diverse. Experimental results on two benchmark datasets indicate that our proposed approach significantly outperforms other alternative methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Video Summarization by Learning From Unpaired Data

Rochan

Wang

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

126

View full text Add to dashboard Cite

show abstract

“…A significant number of deep learning based frameworks have been explored recently in solving video summarization [3,22,24,26,28]. K. Zhang et al creatively applied LSTM in supervised video sequence labelling to model video temporal information with good performance [24].…”

Section: Related Workmentioning

confidence: 99%

“…K. Zhou et al showed that fully unsupervised learning can outperform many supervised methods by considering diversity and representativeness in reinforcement learning-based framework [28]. Y. Zhang et al introduced adversarial loss to video summarization which learns a dilated temporal relational generator and a discriminator with three-player loss [26].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Comprehensive Video Understanding: Video Summarization with Content-Based Video Recommender Design

Jiang

Cui

Peng

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Video summarization aims to extract keyframes/shots from a long video. Previous methods mainly take diversity and representativeness of generated summaries as prior knowledge in algorithm design. In this paper, we formulate video summarization as a content-based recommender problem, which should distill the most useful content from a long video for users who suffer from information overload. A scalable deep neural network is proposed on predicting if one video segment is a useful segment for users by explicitly modelling both segment and video. Moreover, we accomplish scene and action recognition in untrimmed videos in order to find more correlations among different aspects of video understanding tasks. Also, our paper will discuss the effect of audio and visual features in summarization task. We also extend our work by data augmentation and multi-task learning for preventing the model from early-stage overfitting. The final results of our model win the first place in ICCV 2019 CoView Workshop Challenge Track.

show abstract

“…Video/text summarization Existing models are either supervised or unsupervised. Unsupervised summarization models in video [9,31,40,41,43,50,52,54,55] and text [10,26,25,3] domains aim to identify a small subset of key units (video-segments/sentences) that preserve the global content of the input, e.g., using criteria like diversity and representativeness. In contrast, supervised video [13,15,16,42,49,51] and text [37,6,28,30,46] summarization methods solve the same problem by employing ground-truth summaries as training targets.…”

Section: Related Workmentioning

confidence: 99%

Goal-Driven Sequential Data Abstraction

Muhammad

Yang

Hospedales

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Automatic data abstraction is an important capability for both benchmarking machine intelligence and supporting summarization applications. In the former one asks whether a machine can 'understand' enough about the meaning of input data to produce a meaningful but more compact abstraction. In the latter this capability is exploited for saving space or human time by summarizing the essence of input data. In this paper we study a general reinforcement learning based framework for learning to abstract sequential data in a goal-driven way. The ability to define different abstraction goals uniquely allows different aspects of the input data to be preserved according to the ultimate purpose of the abstraction. Our reinforcement learning objective does not require human-defined examples of ideal abstraction. Importantly our model processes the input sequence holistically without being constrained by the original input order. Our framework is also domain agnostic -we demonstrate applications to sketch, video and text data and achieve promising results in all domains. AUAU AU AU AU AU

show abstract

Dilated temporal relational adversarial network for generic video summarization

Cited by 28 publications

References 42 publications

Video Summarization by Learning From Unpaired Data

Video Summarization by Learning From Unpaired Data

Comprehensive Video Understanding: Video Summarization with Content-Based Video Recommender Design

Goal-Driven Sequential Data Abstraction

Contact Info

Product

Resources

About