Bags of Spacetime Energies for Dynamic Scene Recognition

Feichtenhofer, Christoph; Pinz, Axel; Wildes, Richard P.

doi:10.1109/cvpr.2014.343

Cited by 52 publications

(43 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Note that no state-of-the-art results has been reported in the literature on our second test protocol. Our approach outperforms the state-of-the-art result of Feichtenhofer (Feichtenhofer et al, 2014) by more than 7%. While this may be attributed in part to the CNN features, note that our approach still outperforms GDA and CDL based on the same features.…”

Section: Scene Classificationmentioning

confidence: 78%

Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs

Harandi

Salzmann

Baktashmotlagh

2015

2015 IEEE International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model imagesets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszár f -divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifolds, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f -divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.

show abstract

Section: Scene Classificationmentioning

confidence: 78%

Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs

Harandi

Salzmann

Baktashmotlagh

2015

2015 IEEE International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…For a fair comparison with previous studies on Maryland and Yupenn, we followed the leave-one-out evaluation protocol and reported classification results using SVM. Table 5 shows the comparison results, which can be divided into hand-crafted features [20,21,9,16,10] may be because of their temporal encoding based on differences between two adjacent frames, which motivated this study. In contrast, our D3 d based on key segments shows good performance.…”

Section: Dynamic Scene Datasetmentioning

confidence: 99%

“…To better represent dynamic scenes, Derpanis et al [6] introduced multi-scale orientation features using 3D Gaussian third-derivative filters. The bag of features (BoF) scheme [8] was additionally applied to represent several spatiotemporal patches in dynamic scenes [9,10]. Encouraged by the promising results of convolutional neural networks (CNNs) [11,12,13], Tran et al [14] recently proposed a convolutional three-dimensional (C3D) architecture that is a spatiotemporal version of CNN.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

D3: Recognizing dynamic scenes with deep dual descriptor based on key frames and key segments

Hong

Ryu

et al. 2018

Neurocomputing

View full text Add to dashboard Cite

“…They are encoded using a learned dictionary and then dynamically pooled. The technique currently holds the highest accuracy on the two mentioned datasets [22] (VLAD) to obtain better than state of art performance for the event detection problem [30]. Off the shelf descriptors were used to obtain high score on TRECVID-MED dataset.…”

Section: This Sparked a Lot Of Recent Research Work On Architectures mentioning

confidence: 99%

Dynamic scene classification using convolutional neural networks

Gangopadhyay

Tripathi

Jindal

et al. 2016

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

View full text Add to dashboard Cite

The task of classifying videos of natural dynamic scenes into appropriate classes has gained lot of attention in recent years. The problem especially becomes challenging when the camera used to capture the video is dynamic. In this paper, we analyse the performance of statistical aggregation (SA) techniques on various pre-trained convolutional neural network(CNN) models to address this problem.The proposed approach works by extracting CNN activation features for a number of frames in a video and then uses an aggregation scheme in order to obtain a robust feature descriptor for the video. We show through results that the proposed approach performs better than the-state-of-the arts for the Maryland and YUPenn dataset. The final descriptor obtained is powerful enough to distinguish among dynamic scenes and is even capable of addressing the scenario where the camera motion is dominant and the scene dynamics are complex.Further, this paper shows an extensive study on the performance of various aggregation methods and their combinations. We compare the proposed approach with other dynamic scene classification algorithms on two publicly available datasets -Maryland and YUPenn to demonstrate the superior performance of the proposed approach.

show abstract

Bags of Spacetime Energies for Dynamic Scene Recognition

Cited by 52 publications

References 28 publications

Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs

Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs

D3: Recognizing dynamic scenes with deep dual descriptor based on key frames and key segments

Dynamic scene classification using convolutional neural networks

Contact Info

Product

Resources

About