End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Buch, Shyamal; Escorcia, Víctor; Ghanem, Bernard; Li, Feifei; Niebles, Juan Carlos

doi:10.5244/c.31.93

Cited by 191 publications

(146 citation statements)

References 20 publications

(43 reference statements)

Supporting

Mentioning

142

Contrasting

Order By: Relevance

“…(3) methods developing end-to-end architectures integrating the proposal generation and classification [48,1,26]. Our work is built upon the second category where the action proposals are first generated and then used to perform classification and boundary regression.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Graph Convolutional Networks for Temporal Action Localization

Zeng

Huang

Gan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

485

233

View full text Add to dashboard Cite

Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships.

show abstract

Section: Related Workmentioning

confidence: 99%

“…For simplicity, we denote the best tIoU and best overlap as tIoU and OL. Then, three types of training samples can be described as: (1) (3) Background sample: tIoU ≤ θ 4 . These certain thresholds are slightly different on two datasets as shown in Table A.…”

Section: Training Detailsmentioning

confidence: 99%

Graph Convolutional Networks for Temporal Action Localization

Zeng

Huang

Gan

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

485

233

View full text Add to dashboard Cite

show abstract

“…There are also end-to-end frameworks that enable joint optimization of proposal generation and action classification. Buch et al [2] introduce semantics constraints for curriculum training in end-to-end temporal action localization. Chao et al [8] adopt Faster R-CNN [30] for action localization task.…”

Section: Related Workmentioning

confidence: 99%

Learning Temporal Action Proposals With Fewer Labels

Cao

Niebles

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

Temporal action proposals are a common module in action detection pipelines today. Most current methods for training action proposal modules rely on fully supervised approaches that require large amounts of annotated temporal action intervals in long video sequences. The large cost and effort in annotation that this entails motivate us to study the problem of training proposal modules with less supervision. In this work, we propose a semi-supervised learning algorithm specifically designed for training temporal action proposal networks. When only a small number of labels are available, our semi-supervised method generates significantly better proposals than the fully-supervised counterpart and other strong semi-supervised baselines. We validate our method on two challenging action detection video datasets, ActivityNet v1.3 and THUMOS14. We show that our semi-supervised approach consistently matches or outperforms the fully supervised state-of-the-art approaches.

show abstract

“…In most works, the temporal action proposal generation is a sub-task of the overall approach. However, there also exist end-to-end approaches [2].…”

Section: Related Workmentioning

confidence: 99%

Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

Schlosser

Münch

Arens

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

View full text Add to dashboard Cite

In this paper, several variants of two-stream architectures for temporal action proposal generation in long, untrimmed videos are presented. Inspired by the recent advances in the field of human action recognition utilizing 3D convolutions in combination with two-stream networks and based on the Single-Stream Temporal Action Proposals (SST) architecture [3], four different two-stream architectures utilizing sequences of images on one stream and sequences of images of optical flow on the other stream are subsequently investigated. The four architectures fuse the two separate streams at different depths in the model; for each of them, a broad range of parameters is investigated systematically as well as an optimal parametrization is empirically determined. The experiments on the THU-MOS'14 [11] dataset show that all four two-stream architectures are able to outperform the original single-stream SST and achieve state of the art results. Additional experiments revealed that the improvements are not restricted to a single method of calculating optical flow by exchanging the formerly used method of Brox [1] with FlowNet2 [10] and still achieving improvements.

show abstract

End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos

Cited by 191 publications

References 20 publications

Graph Convolutional Networks for Temporal Action Localization

Graph Convolutional Networks for Temporal Action Localization

Learning Temporal Action Proposals With Fewer Labels

Investigation on Combining 3D Convolution of Image Data and Optical Flow to Generate Temporal Action Proposals

Contact Info

Product

Resources

About