2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.393
|View full text |Cite
|
Sign up to set email alerts
|

Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction

Abstract: Video observed = 40% (a) Video observed = 80% (b) Video observed = 100% (c) Action Figure 1: Online spatio-temporal action localisation in a test 'fencing' video from UCF-101-24 [43]. (a) to (c): A 3D volumetric view of the video showing detection boxes and selected frames. At any given time, a certain portion (%) of the entirevideo is observed by the system, and the detection boxes are linked up to incrementally build space-time action tubes. Note that the proposed method is able to detect multiple co-occurri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
373
0
1

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 264 publications
(399 citation statements)
references
References 63 publications
0
373
0
1
Order By: Relevance
“…Method Accuracy Temporal Fusion [11] 86.0 ROAD [47] 92.0 ROAD + BroxFlow [47] 90.0 RBF-RNN [45] 98.0 Proposed 98.9 Table 2. Action anticipation results for UCF101-24 considering 50% of frames from each video.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Method Accuracy Temporal Fusion [11] 86.0 ROAD [47] 92.0 ROAD + BroxFlow [47] 90.0 RBF-RNN [45] 98.0 Proposed 98.9 Table 2. Action anticipation results for UCF101-24 considering 50% of frames from each video.…”
Section: Methodsmentioning
confidence: 99%
“…UCF101-24 [47] is a subset of the UCF101 dataset. It is composed of 24 action classes in 3207 videos.…”
Section: Datasetsmentioning
confidence: 99%
“…An extra dropout layer is further added with dropout ratio 0.5 before the softmax/sigmoid layer. Following [17,24,28,38], we also exploit a two-stream pipeline for utilizing multiple modalities, where the RGB frame and the stacked optical flow "image" are considered. To fuse the detection results, late fusion scheme is taken to average the classification scores.…”
Section: Implementationsmentioning
confidence: 99%
“…For fair comparisons, we also utilize ResNet101 [11] as the backbone in our TPN. Following [17,24,28,38], we report the performance of LSTR on the late fusion of RGB images and optical flow images inputs. Table 4 summarizes video-mAP performances on UCF-Sports, J-HMDB (3 splits) and UCF-101 datasets with different IoU thresholds δ .…”
Section: Comparison With State-of-the-artmentioning
confidence: 99%
“…Singh et al [30] recently developed a method that generates candidate action bounding boxes in frames based on appearance and flow. These bounding boxes are incrementally grouped in action tubes, and those (partial) tubes get a class probability with Viterbi's algorithm.…”
Section: Related Workmentioning
confidence: 99%