2014 IEEE Conference on Computer Vision and Pattern Recognition 2014
DOI: 10.1109/cvpr.2014.343
|View full text |Cite
|
Sign up to set email alerts
|

Bags of Spacetime Energies for Dynamic Scene Recognition

Abstract: This paper presents a unified bag of visual word (BoW) framework for dynamic scene recognition. The approach builds on primitive features that uniformly capture spatial and temporal orientation structure of the imagery (e.g., video), as extracted via application of a bank of spatiotemporally oriented filters. Various feature encoding techniques are investigated to abstract the primitives to an intermediate representation that is best suited to dynamic scene representation. Further, a novel approach to adaptive… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0
1

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(43 citation statements)
references
References 28 publications
0
42
0
1
Order By: Relevance
“…Note that no state-of-the-art results has been reported in the literature on our second test protocol. Our approach outperforms the state-of-the-art result of Feichtenhofer (Feichtenhofer et al, 2014) by more than 7%. While this may be attributed in part to the CNN features, note that our approach still outperforms GDA and CDL based on the same features.…”
Section: Scene Classificationmentioning
confidence: 78%
“…Note that no state-of-the-art results has been reported in the literature on our second test protocol. Our approach outperforms the state-of-the-art result of Feichtenhofer (Feichtenhofer et al, 2014) by more than 7%. While this may be attributed in part to the CNN features, note that our approach still outperforms GDA and CDL based on the same features.…”
Section: Scene Classificationmentioning
confidence: 78%
“…For a fair comparison with previous studies on Maryland and Yupenn, we followed the leave-one-out evaluation protocol and reported classification results using SVM. Table 5 shows the comparison results, which can be divided into hand-crafted features [20,21,9,16,10] may be because of their temporal encoding based on differences between two adjacent frames, which motivated this study. In contrast, our D3 d based on key segments shows good performance.…”
Section: Dynamic Scene Datasetmentioning
confidence: 99%
“…To better represent dynamic scenes, Derpanis et al [6] introduced multi-scale orientation features using 3D Gaussian third-derivative filters. The bag of features (BoF) scheme [8] was additionally applied to represent several spatiotemporal patches in dynamic scenes [9,10]. Encouraged by the promising results of convolutional neural networks (CNNs) [11,12,13], Tran et al [14] recently proposed a convolutional three-dimensional (C3D) architecture that is a spatiotemporal version of CNN.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…They are encoded using a learned dictionary and then dynamically pooled. The technique currently holds the highest accuracy on the two mentioned datasets [22] (VLAD) to obtain better than state of art performance for the event detection problem [30]. Off the shelf descriptors were used to obtain high score on TRECVID-MED dataset.…”
Section: This Sparked a Lot Of Recent Research Work On Architectures mentioning
confidence: 99%