2014 IEEE Conference on Computer Vision and Pattern Recognition 2014
DOI: 10.1109/cvpr.2014.326
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Action Localization with Approximately Normalized Fisher Vectors

Abstract: International audienceThe Fisher vector (FV) representation is a high-dimensional extension of the popular bag-of-word representation. Transformation of the FV by power and L2 normalizations has been shown to significantly improve its performance. With these normalizations included, this representation has yielded state-of-the-art results for a wide number of image and video classification and retrieval tasks. The normalizations, however, render the representation non-additive over local descriptors. Combined … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
48
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 66 publications
(49 citation statements)
references
References 33 publications
(66 reference statements)
1
48
0
Order By: Relevance
“…We attribute this improvement to the fact that our approach scans the video in a much more efficient way. We obtain a similar performance to Caba Heilbron et al [4] and Oneata et al [25]. This result is encouraging given that our detection pipeline operates at a much faster rate of 134 FPS.…”
Section: Daps For Action Detectionsupporting
confidence: 76%
See 1 more Smart Citation
“…We attribute this improvement to the fact that our approach scans the video in a much more efficient way. We obtain a similar performance to Caba Heilbron et al [4] and Oneata et al [25]. This result is encouraging given that our detection pipeline operates at a much faster rate of 134 FPS.…”
Section: Daps For Action Detectionsupporting
confidence: 76%
“…Action Detection: In contrast to object detection methods, the dominant approach for action detection is still to use a sliding window approach [26,18,12] combined with action classifiers trained on multiple features [2,9,33]. Previous approaches have reduced the computational overhead of sliding window search by using branch-and-bound techniques [5,27] and exploiting some characteristics of the visual descriptors. In contrast, our model efficiently reduces the number of evaluated windows by encoding a sequence of visual descriptors.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, only linear classifier is required by FV, which is a huge advantage in large-scale problems. Therefore, FV and its simplified non-probabilistic version VLAD become more popular in action recognition [3,10,12,21,22,24,34,36].…”
Section: Introductionmentioning
confidence: 99%
“…Among them, VLAD and FV show outstanding performances for human action recognition [5][24] [25] [27][28] [29]. Compared with BoW in Figure 1, VLAD records the 1st-order difference between local features and codewords, i.e., the residual vectors generated by hard assignment.…”
Section: Introductionmentioning
confidence: 99%