ActivityNet: A large-scale video benchmark for human activity understanding

Heilbron, Fabian Caba; Escorcia, Víctor; Ghanem, Bernard; Niebles, Juan Carlos

doi:10.1109/cvpr.2015.7298698

Cited by 1,729 publications

(883 citation statements)

References 32 publications

Supporting

Mentioning

816

Contrasting

Unclassified

Order By: Relevance

“…We evaluate our formulation on a large-scale, realistic activity dataset: ActivityNet [4]. Using our proposed ranking losses in training significantly improves performance in both the activity detection and early activity detection tasks.…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

Sigal

Sclaroff

2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

354

262

View full text Add to dashboard Cite

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

“…The ActivityNet [4] dataset comprises 28K videos of 203 activity categories collected from YouTube. Fig.…”

Section: Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

Sigal

Sclaroff

2016

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

354

262

View full text Add to dashboard Cite

show abstract

“…We report results on the 213 test videos with temporal annotations. To study the generalization capability of our model across datasets, we also test on the validation set of the ActivityNet benchmark (release 1.2) [3], which comprises 76 hours of video and 100 action classes. No fine tuning is done on this benchmark.…”

Section: Methodsmentioning

confidence: 99%

“…However, it was recently shown that the temporal footprint of some methods can be as accurate as sampling temporal proposals uniformly in the video [4]. Moreover, these methods evaluate their performance on simple or repetitive actions in short video clips, which makes it difficult to gauge their scalability to large collections of video sequences containing more challenging activities [18,3]. Given the current state-of-the-art of spatio-temporal action proposals, it is worth exploring how only temporal action proposals can contribute to the semantic analysis of videos.…”

Section: Introductionmentioning

confidence: 99%

DAPs: Deep Action Proposals for Action Understanding

Escorcia

Heilbron

Niebles

et al. 2016

Lecture Notes in Computer Science

Self Cite

353

316

View full text Add to dashboard Cite

Abstract. Object proposals have contributed significantly to recent advances in object understanding in images. Inspired by the success of this approach, we introduce Deep Action Proposals (DAPs), an effective and efficient algorithm for generating temporal action proposals from long videos. We show how to take advantage of the vast capacity of deep learning models and memory cells to retrieve from untrimmed videos temporal segments, which are likely to contain actions. A comprehensive evaluation indicates that our approach outperforms previous work on a large scale action benchmark, runs at 134 FPS making it practical for large-scale scenarios, and exhibits an appealing ability to generalize, i.e. to retrieve good quality temporal proposals of actions unseen in training.

show abstract