Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Chao, Yu-Wei; Vijayanarasimhan, Sudheendra; Seybold, Bryan; Ross, David A.; Deng, Jia; Sukthankar, Rahul

doi:10.1109/cvpr.2018.00124

Cited by 638 publications

(435 citation statements)

References 45 publications

Supporting

Mentioning

429

Contrasting

Unclassified

Order By: Relevance

“…Buch et al [2] introduce semantics constraints for curriculum training in end-to-end temporal action localization. Chao et al [8] adopt Faster R-CNN [30] for action localization task.…”

Section: Related Workmentioning

confidence: 99%

Learning Temporal Action Proposals With Fewer Labels

Cao

Niebles

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Temporal action proposals are a common module in action detection pipelines today. Most current methods for training action proposal modules rely on fully supervised approaches that require large amounts of annotated temporal action intervals in long video sequences. The large cost and effort in annotation that this entails motivate us to study the problem of training proposal modules with less supervision. In this work, we propose a semi-supervised learning algorithm specifically designed for training temporal action proposal networks. When only a small number of labels are available, our semi-supervised method generates significantly better proposals than the fully-supervised counterpart and other strong semi-supervised baselines. We validate our method on two challenging action detection video datasets, ActivityNet v1.3 and THUMOS14. We show that our semi-supervised approach consistently matches or outperforms the fully supervised state-of-the-art approaches.

show abstract

“…Buch et al [2] introduce semantics constraints for curriculum training in end-to-end temporal action localization. Chao et al [8] adopt Faster R-CNN [30] for action localization task.…”

Section: Related Workmentioning

confidence: 99%

Learning Temporal Action Proposals With Fewer Labels

Cao

Niebles

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Temporal Action localization has attracted increasing attention in the last several years [6,18,26,33,34]. Inspired by the success of object detection, most current action detection methods resort to the two-stage pipeline: they first generate a set of 1D temporal proposals and then perform classification and temporal boundary regression on each proposal individually.…”

Section: Introductionmentioning

confidence: 99%

Graph Convolutional Networks for Temporal Action Localization

Zeng

Huang

Gan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

485

233

View full text Add to dashboard Cite

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships.

show abstract

“…It combines 2D convolutional neural network and optical flow to capture appearance and motion features respectively. Recently, as kinds of 3D convolutional neural networks such as C3D [22], P3D [18], I3D [2] and 3D-ResNet [9] appear, adopting 3D convolutional neural network to extract spatio-temporal feature is getting more and more popular [1,2,25,3]. Temporal Action Proposals and Detection.…”

Section: Related Workmentioning

confidence: 99%

“…Temporal Action Proposals and Detection. Since natural videos are always long and untrimmed, temporal action proposals and detection have aroused intensive interest from researchers [6,26,1,25,3,8]. DAP [4] leverages LSTM to encode the video sequence for temporal features.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep Point-Wise Prediction for Action Temporal Proposal

Kong²,

Sun

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Detecting actions in videos is an important yet challenging task. Previous works usually utilize (a) sliding window paradigms, or (b) per-frame action scoring and grouping to enumerate the possible temporal locations. Their performances are also limited to the designs of sliding windows or grouping strategies. In this paper, we present a simple and effective method for temporal action proposal generation, named Deep Point-wise Prediction (DPP). DPP simultaneously predicts the action existing possibility and the corresponding temporal locations, without the utilization of any handcrafted sliding window or grouping. The whole system is end-to-end trained with joint loss of temporal action proposal classification and location prediction. We conduct extensive experiments to verify its effectiveness, generality and robustness on standard THUMOS14 dataset. DPP runs more than 1000 frames per second, which largely satisfies the real-time requirement. The code is available at https://github.com/liluxuan1997/DPP.

show abstract

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Cited by 638 publications

References 45 publications

Learning Temporal Action Proposals With Fewer Labels

Learning Temporal Action Proposals With Fewer Labels

Graph Convolutional Networks for Temporal Action Localization

Deep Point-Wise Prediction for Action Temporal Proposal

Contact Info

Product

Resources

About