Shuffled ImageNet Banks for Video Event Detection and Search

Mettes, Pascal; Koelma, D.C.; Snoek, Cees G. M.

doi:10.1145/3377875

Cited by 28 publications

(19 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The effectiveness of embedding-based methods such as the W2VV++ model used by last year's highest scoring system, SOMHunter [26], as also shown in an evaluation of SOMHunter and vitrivr [53], makes such models a valuable addition to retrieval systems. The W2VV++ model and its variants [31,34,40] was used by VIRET, SOMHunter, VBS2020 Winner, and in the form of features for image search CollageHunter. vitrivr and vitrivr-VR used a similar approach [68].…”

Section: Text Searchmentioning

confidence: 99%

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Heller

Gsteiger

Bailer

et al. 2022

Int J Multimed Info Retr

View full text Add to dashboard Cite

The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.

show abstract

Section: Text Searchmentioning

confidence: 99%

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Heller

Gsteiger

Bailer

et al. 2022

Int J Multimed Info Retr

View full text Add to dashboard Cite

show abstract

“…ing strategies. The SOMHunter and VIRET systems relied on the same BoW variant of the W2VV++ model [33,35], a query representation learning approach employing visual features obtained from deep networks trained with a high number of classes [43,44]. For more details about the employed W2VV++ variant and used similarity for each system, we refer to [35].…”

Section: Text Search Vbs 2020 Witnessed Various Search Models Based On Different Text-image Match-mentioning

confidence: 99%

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Lokoč

Veselý

Mejzlík

et al. 2021

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

Comprehensive and fair performance evaluation of information retrieval systems represents an essential task for the current information age. Whereas Cranfield-based evaluations with benchmark datasets support development of retrieval models, significant evaluation efforts are required also for user-oriented systems that try to boost performance with an interactive search approach. This article presents findings from the 9th Video Browser Showdown, a competition that focuses on a legitimate comparison of interactive search systems designed for challenging known-item search tasks over a large video collection. During previous installments of the competition, the interactive nature of participating systems was a key feature to satisfy known-item search needs, and this article continues to support this hypothesis. Despite the fact that top-performing systems integrate the most recent deep learning models into their retrieval process, interactive searching remains a necessary component of successful strategies for known-item search tasks. Alongside the description of competition settings, evaluated tasks, participating teams, and overall results, this article presents a detailed analysis of query logs collected by the top three performing systems, SOMHunter, VIRET, and vitrivr. The analysis provides a quantitative insight to the observed performance of the systems and constitutes a new baseline methodology for future events. The results reveal that the top two systems mostly relied on temporal queries before a correct frame was identified. An interaction log analysis complements the result log findings and points to the importance of result set and video browsing approaches. Finally, various outlooks are discussed in order to improve the Video Browser Showdown challenge in the future.

show abstract

“…Similar to the common word embedding setup, for a video v ∈ V, we seek to obtain a score for action a ∈ A using a set of global objects G. Global objects generally come from deep networks (Mettes et al 2020) pre-trained on large-scale object datasets (Deng et al 2009). We build upon current semantic matching approaches by providing three simple priors that deal with semantic ambiguity, non-discriminative objects, and object naming.…”

Section: Priors For Ambiguity Discrimination and Namingmentioning

confidence: 99%

“…The pre-trained network includes the person class and 79 objects, such as car, chair, and tv. For the global object scores over whole videos, we apply a GoogLeNet (Szegedy et al 2015), pre-trained on 12,988 Ima-geNet categories (Mettes et al 2020). The object probability distributions are averaged over the sampled frames to obtain the global object scores.…”

Section: Object Priors Sourcesmentioning

confidence: 99%

Object Priors for Classifying and Localizing Unseen Actions

2021

Self Cite

View full text Add to dashboard Cite

This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples. Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based object information only. We propose three spatial object priors, which encode local person and object detectors along with their spatial relations. On top we introduce three semantic object priors, which extend semantic matching through word embeddings with three simple functions that tackle semantic ambiguity, object discrimination, and object naming. A video embedding combines the spatial and semantic object priors. It enables us to introduce a new video retrieval task that retrieves action tubes in video collections based on user-specified objects, spatial relations, and object size. Experimental evaluation on five action datasets shows the importance of spatial and semantic object priors for unseen actions. We find that persons and objects have preferred spatial relations that benefit unseen action localization, while using multiple languages and simple object filtering directly improves semantic matching, leading to state-of-the-art results for both unseen action classification and localization.

show abstract

Shuffled ImageNet Banks for Video Event Detection and Search

Cited by 28 publications

References 63 publications

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Object Priors for Classifying and Localizing Unseen Actions

Contact Info

Product

Resources

About