2020
DOI: 10.1145/3377875
|View full text |Cite
|
Sign up to set email alerts
|

Shuffled ImageNet Banks for Video Event Detection and Search

Abstract: This article aims for the detection and search of events in videos, where video examples are either scarce or even absent during training. To enable such event detection and search, ImageNet concept banks have shown to be effective. Rather than employing the standard concept bank of 1,000 ImageNet classes, we leverage the full 21,841-class dataset. We identify two problems with using the full dataset: (i) there is an imbalance between the number of examples per concept, and (ii) not all concepts are equally re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 28 publications
(19 citation statements)
references
References 63 publications
0
19
0
Order By: Relevance
“…The effectiveness of embedding-based methods such as the W2VV++ model used by last year's highest scoring system, SOMHunter [26], as also shown in an evaluation of SOMHunter and vitrivr [53], makes such models a valuable addition to retrieval systems. The W2VV++ model and its variants [31,34,40] was used by VIRET, SOMHunter, VBS2020 Winner, and in the form of features for image search CollageHunter. vitrivr and vitrivr-VR used a similar approach [68].…”
Section: Text Searchmentioning
confidence: 99%
“…The effectiveness of embedding-based methods such as the W2VV++ model used by last year's highest scoring system, SOMHunter [26], as also shown in an evaluation of SOMHunter and vitrivr [53], makes such models a valuable addition to retrieval systems. The W2VV++ model and its variants [31,34,40] was used by VIRET, SOMHunter, VBS2020 Winner, and in the form of features for image search CollageHunter. vitrivr and vitrivr-VR used a similar approach [68].…”
Section: Text Searchmentioning
confidence: 99%
“…ing strategies. The SOMHunter and VIRET systems relied on the same BoW variant of the W2VV++ model [33,35], a query representation learning approach employing visual features obtained from deep networks trained with a high number of classes [43,44]. For more details about the employed W2VV++ variant and used similarity for each system, we refer to [35].…”
Section: Text Search Vbs 2020 Witnessed Various Search Models Based On Different Text-image Match-mentioning
confidence: 99%
“…Similar to the common word embedding setup, for a video v ∈ V, we seek to obtain a score for action a ∈ A using a set of global objects G. Global objects generally come from deep networks (Mettes et al 2020) pre-trained on large-scale object datasets (Deng et al 2009). We build upon current semantic matching approaches by providing three simple priors that deal with semantic ambiguity, non-discriminative objects, and object naming.…”
Section: Priors For Ambiguity Discrimination and Namingmentioning
confidence: 99%
“…The pre-trained network includes the person class and 79 objects, such as car, chair, and tv. For the global object scores over whole videos, we apply a GoogLeNet (Szegedy et al 2015), pre-trained on 12,988 Ima-geNet categories (Mettes et al 2020). The object probability distributions are averaged over the sampled frames to obtain the global object scores.…”
Section: Object Priors Sourcesmentioning
confidence: 99%