Predicting the Where and What of Actors and Actions through Online Action Localization

Soomro, Khurram; Idrees, Haroon; Shah, Mubarak

doi:10.1109/cvpr.2016.290

Cited by 74 publications

(89 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Li et al 's work [27] exploits sequence mining, where a series of actions and object co-occurrences are encoded as symbolic sequences. Soomro et al [43] propose to use binary SVMs to localize and classify video snippets into sub-action categories, and obtain the final class label in an online manner using dynamic programming. In [50], action prediction is approached using still images with action-scene correlations.…”

Section: Related Workmentioning

confidence: 99%

Action Anticipation with RBF Kernelized Feature Mapping RNN

Shi

Fernando

Hartley

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

We introduce a novel Recurrent Neural Network-based algorithm for future video feature generation and action anticipation called feature mapping RNN . Our novel RNN architecture builds upon three effective principles of machine learning, namely parameter sharing, Radial Basis Function kernels and adversarial training. Using only some of the earliest frames of a video, the feature mapping RNN is able to generate future features with a fraction of the parameters needed in traditional RNN. By feeding these future features into a simple multilayer perceptron facilitated with an RBF kernel layer, we are able to accurately predict the action in the video.In our experiments, we obtain 18% improvement on JHMDB-21 dataset, 6% on UCF101-24 and 13% improvement on UT-Interaction datasets over prior stateof-the-art for action anticipation.

show abstract

Section: Related Workmentioning

confidence: 99%

Action Anticipation with RBF Kernelized Feature Mapping RNN

Shi

Fernando

Hartley

2018

Computer Vision – ECCV 2018

View full text Add to dashboard Cite

show abstract

“…The idea of anticipation was introduced in the computer vision community almost a decade ago by [35]. While the early methods [34,40,39] relied on handcrafted-features, they have now been superseded by end-to-end learning methods [21,12,1], focusing on designing new losses better-suited to anticipation. In particular, the loss of [1] has proven highly effective, achieving state-of-the-art results on several standard benchmarks.…”

Section: Baseline Methodsmentioning

confidence: 99%

VIENA $$^2$$ : A Driving Anticipation Dataset

Aliakbarian

Saleh

Salzmann

et al. 2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, taskspecific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA 2 , covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.

show abstract

“…Metrics. Following [34,38,48], we utilize video mean Average Precision (mAP ) to evaluate action detection accuracy. We calculate an average of per-frame Intersectionover-Union (IoU) across time between tubes.…”

Section: Datasets Metrics and Implementationmentioning

confidence: 99%

Dance With Flow: Two-In-One Stream Action Detection

Zhao

Snoek

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single twoin-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance-or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

show abstract

Predicting the Where and What of Actors and Actions through Online Action Localization

Cited by 74 publications

References 33 publications

Action Anticipation with RBF Kernelized Feature Mapping RNN

Action Anticipation with RBF Kernelized Feature Mapping RNN

VIENA $$^2$$ : A Driving Anticipation Dataset

Dance With Flow: Two-In-One Stream Action Detection

Contact Info

Product

Resources

About