2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00153
|View full text |Cite
|
Sign up to set email alerts
|

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos

Abstract: We address the problem of story-based temporal summarization of long 360 • videos. We propose a novel memory network model named Past-Future Memory Network (PFMN), in which we first compute the scores of 81 normal field of view (NFOV) region proposals cropped from the input 360 • video, and then recover a latent, collective summary using the network with two external memories that store the embeddings of previously selected subshots and future candidate subshots. Our major contributions are twofold. First, our… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(35 citation statements)
references
References 46 publications
0
35
0
Order By: Relevance
“…Keys are used to address relevant memories whose corresponding values are returned. Recently, the memory networks have been applied to some vision problems such as personalized image captioning [25], visual tracking [41], movie understanding [23], and summarization [17].…”
Section: Memory Networkmentioning
confidence: 99%
“…Keys are used to address relevant memories whose corresponding values are returned. Recently, the memory networks have been applied to some vision problems such as personalized image captioning [25], visual tracking [41], movie understanding [23], and summarization [17].…”
Section: Memory Networkmentioning
confidence: 99%
“…It produces a spherical score map for 360 • video segments and uses a sliding window kernel to decide which view is suitable for highlighting. Lee et al [156] proposed a past-future memory network with two external memories. The memories are used for storing previously chosen subshots and future candidate subshots' embeddings for temporal summarization of 360 • videos.…”
Section: Vr Image and Video Editingmentioning
confidence: 99%
“…So, the goal was to train the summarizer (that contains the generator) in order to maximally confuse the discriminator when trying to distinguish the original from the reconstructed video; a condition that indicates a highly representative keyframe summary. Based on an implemented variation of this model [5] (used to evaluate SUM-GAN when summarizing 360 • videos [15]), we scrutinized features of the architecture and the training process that could be fine-tuned to improve the model's performance. As depicted in the block-diagram of that: i) contains a linear compression layer which reduces the size of the input feature vectors and the number of learned parameters, ii) follows an incremental approach for training the model's components, and iii) applies a stepwise label-based learning strategy for the adversarial part of the architecture.…”
Section: Building On Adversarial Learningmentioning
confidence: 99%