2016
DOI: 10.48550/arxiv.1608.01529
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(27 citation statements)
references
References 0 publications
0
27
0
Order By: Relevance
“…After detecting the action regions in the frames, some methods [74], [75], [76], [77], [78], [79], [80], [81] use optical flow to capture motion cues. They employ linking algorithms to connect the frame-level bounding boxes into spatio-temporal action tubes.…”
Section: Frame-level Action Detectionmentioning
confidence: 99%
See 2 more Smart Citations
“…After detecting the action regions in the frames, some methods [74], [75], [76], [77], [78], [79], [80], [81] use optical flow to capture motion cues. They employ linking algorithms to connect the frame-level bounding boxes into spatio-temporal action tubes.…”
Section: Frame-level Action Detectionmentioning
confidence: 99%
“…Weinzaepfel et al [79] replaced the linking algorithm by a tracking-by-detection method. Then, two-stream Faster R-CNN was introduced by [76], [78]. Saha et al [78] fuse the scores of both streams based on overlap between the appearance and the motion.…”
Section: Frame-level Action Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Spatio-temporal activity detection localizes activities within spatio-temporal tubes and requires heavier annotation work to collect the training data. [7,22,35,39,43] temporally track bounding boxes corresponding to activities in each frame to realize spatiotemporal activity detection. We focus on temporal activity detection [4,14,18,26,28,38] which only predicts the start and end times of the activities within long untrimmed videos and classifies the overall activity without spatially localizing people and objects in the frame.…”
Section: Activity Detectionmentioning
confidence: 99%
“…The proposed pipeline (Figure 2) is divided into three parts: (i) action tubes detection, (ii) partbased feature extraction and learning via 3D deformable RoI pooling, and (iii) a sparsity strategy to allow for components of an action to be activated or deactivated while retaining the overall semantics in terms of activity. Action tube detection is a necessary prepossessing step aimed at detecting the spatiotemporal location of all the atomic ac-tions present [11,22,41,31,4,40,33]. The tube detector needs to ensure a fixed-size representation for each activity part (atomic action).…”
Section: Introductionmentioning
confidence: 99%