Semantic Text Summarization of Long Videos

Sah, Shagan; Kulhare, Sourabh; Gray, Allison; Venugopalan, Subhashini; Prud’hommeaux, Emily; Ptucha, Raymond

doi:10.1109/wacv.2017.115

Cited by 39 publications

(30 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(5) Semantic-Sum [19]: a recent method that also identifies the video segments as ours. We find that this method gets best performance when setting sentence length as 3 and using Latent Semantic Analysis [21] in summarization module.…”

Section: Comparison With Other Methodsmentioning

confidence: 99%

“…(1) When generating each sentence of the paragraph, the method in [29] requires the features from the entire video, which is expensive for very long videos, while our method only requires features in selected proposals. (2) In [19], the clips are selected according to frame quality in advance as a preprocessing step, without taking into account the coherence of narration. This way will lead to redundancy in the resulting paragraph.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Move Forward and Tell: A Progressive Generator of Video Descriptions

Xiong

Dai

Lin

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentencelevel and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.

show abstract

Section: Comparison With Other Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Move Forward and Tell: A Progressive Generator of Video Descriptions

Xiong

Dai

Lin

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

show abstract

“…Here the summary is based on sentence to centroid score, cue phrase score, sentence position score, numerical data and tf-idf score. Textual summaries of long videos (Shagan et al, 2017) are generated using recurrent networks where key frames are taken from impactful segments and are converted to textual annotations. The sequence of events in the video are summarized to generate a paragraph description.…”

Section: General Methodsmentioning

confidence: 99%

Text Summarization Using Morphological Filtering of Intuitionistic Fuzzy Hypergraph

Mohanan¹,

Rao²,

Jathavedan³

et al. 2018

Journal of Computer Science

View full text Add to dashboard Cite

Text Summarization has been an area of interest for many years. It refers to creating a concise text of a document without any lose of information. Researchers in the area of natural language processing have developed many abstractive and extractive methods for creating summary. Abstractive summaries modifies the sentences and creates a modified concise form, while extractive summaries pick relevant sentences. The extractive method used in this study is a novel one which models the document as an Intuitionistic Fuzzy Hypergraph (IFHG). This IFHG is subjected to morphological filtering in order to create a concise summary. This is the premier work which applies morphological operations on IFHG that is modeled on a text. The method has generated summary which is almost similar to a human generated summary and showed more accuracy when compared with other machine generated summaries.

show abstract

“…In automotive or indoor robotic visual perception problems, simple concatenation techniques perform well but they fall short in some applications like video captioning [10,33] or summarization [42] where long term dependencies are required. LSTMs in such cases offer a better alternative [59,45].…”

Section: Feature Aggregationmentioning

confidence: 99%

MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning

Chennupati

Sistu

Yogamani

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

Multi-task learning is commonly used in autonomous driving for solving various visual perception tasks. It offers significant benefits in terms of both performance and computational complexity. Current work on multi-task learning networks focus on processing a single input image and there is no known implementation of multi-task learning handling a sequence of images. In this work, we propose a multistream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion. The weights of the current and previous encoder are shared so that features computed in the previous frame can be leveraged without additional computation. In addition, we propose to use the geometric mean of task losses as a better alternative to the weighted average of task losses. The proposed loss function facilitates better handling of the difference in convergence rates of different tasks. Experimental results on KITTI, Cityscapes and SYNTHIA datasets demonstrate that the proposed strategies outperform various existing multi-task learning solutions.

show abstract

Semantic Text Summarization of Long Videos

Cited by 39 publications

References 22 publications

Move Forward and Tell: A Progressive Generator of Video Descriptions

Move Forward and Tell: A Progressive Generator of Video Descriptions

Text Summarization Using Morphological Filtering of Intuitionistic Fuzzy Hypergraph

MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning

Contact Info

Product

Resources

About