TagBook: A Semantic Video Representation Without Supervision for Event Detection

Mazloom, Masoud; Li, Xirong; Snoek, Cees G. M.

doi:10.1109/tmm.2016.2559947

Cited by 38 publications

(18 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…is raises new challenges in searching both within and across videos. e problem of making videos content more accessible has spurred research in automatic tagging [2,39,51] and video summarization [1,15,26,27,31,36,49,57,69]. In automatic tagging, the goal is to predict meta-data in form of tags, which makes videos searchable via text queries.…”

Section: Introductionmentioning

confidence: 99%

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Vasudevan

Gygli

Volokitin

et al. 2017

Proceedings of the 25th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Although the problem of automatic video summarization has recently received a lot of a ention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing queryrelevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, representative of the entire video, and relevant to a text query. We quantify relevance by measuring the distance between frames and queries in a common textual-visual semantic embedding space induced by a neural network. In addition, we extend the model to capture query-independent properties, such as frame quality. We compare our method against previous state of the art on textual-visual embeddings for thumbnail selection and show that our model outperforms them on relevance prediction. Furthermore, we introduce a new dataset, annotated with diversity and query-speci c relevance labels. On this dataset, we train and test our complete model for video summarization and show that it outperforms standard baselines such as Maximal Marginal Relevance.

show abstract

Section: Introductionmentioning

confidence: 99%

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Vasudevan

Gygli

Volokitin

et al. 2017

Proceedings of the 25th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Merler et al [25] and Ma et al [26] utilize external images and videos to build an intermediate level video representation for event detection. Mazloom et al [27] learn a video descriptor based on the tags of their nearest neighbors in a large collection of social tagged videos. Song et al [28] extract key segments for event detection by transferring concept knowledge from web images and videos.…”

Section: A Related Workmentioning

confidence: 99%

Probabilistic Semantic Retrieval for Surveillance Videos With Activity Graphs

Chen

Wang

Bai

et al. 2019

IEEE Trans. Multimedia

View full text Add to dashboard Cite

We present a novel framework for finding complex activities matching user-described queries in cluttered surveillance videos. The wide diversity of queries coupled with unavailability of annotated activity data limits our ability to train activity models. To bridge the semantic gap we propose to let users describe an activity as a semantic graph with object attributes and inter-object relationships associated with nodes and edges, respectively. We learn node/edge-level visual predictors during training and, at test-time, propose to retrieve activity by identifying likely locations that match the semantic graph. We formulate a novel CRF based probabilistic activity localization objective that accounts for mis-detections, mis-classifications and track-losses, and outputs a likelihood score for a candidate grounded location of the query in the video. We seek groundings that maximize overall precision and recall. To handle the combinatorial search over all high-probability groundings, we propose a highest precision subgraph matching algorithm. Our method outperforms existing retrieval methods on benchmarked datasets.

show abstract

“…For instance, [12], [61] conduct unsupervised representation learning by reinforcing the visual representations generated from hand-craft features through the use of freely available social tags or text descriptions of web videos. Neural networks are used to construct unsupervised feature representations via auto-encoders [62], [63] and restricted Boltzmann machines (RBMs) [64].…”

Section: B Representation Learningmentioning

confidence: 99%

Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension

Hao

Goulermas

et al. 2017

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Abstract-In this work, a novel unsupervised hashing algorithm, referred to as t-USMVH, and its extension to unsupervised deep hashing, referred to as t-UDH, are proposed to support large-scale video-to-video retrieval. To improve robustness of the unsupervised learning, t-USMVH combines multiple types of feature representations and effectively fuses them by examining a continuous relevance score based on a Gaussian estimation over pairwise distances, and also a discrete neighbor score based on the cardinality of reciprocal neighbors. To reduce sensitivity to scale changes for mapping objects that are far apart from each other, Student t-distribution is used to estimate the similarity between the relaxed hash code vectors for keyframes. This results in more accurate preservation of the desired unsupervised similarity structure in the hash code space. By adapting the corresponding optimization objective and constructing the hash mapping function via a deep neural network, we develop a robust unsupervised training strategy for a deep hashing network. The efficiency and effectiveness of the proposed methods are evaluated on two public video collections via comparisons against multiple classical and state-of-the-art methods.

show abstract

TagBook: A Semantic Video Representation Without Supervision for Event Detection

Cited by 38 publications

References 54 publications

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Query-adaptive Video Summarization via Quality-aware Relevance Estimation

Probabilistic Semantic Retrieval for Surveillance Videos With Activity Graphs

Unsupervised t-Distributed Video Hashing and Its Deep Hashing Extension

Contact Info

Product

Resources

About