2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00717
|View full text |Cite
|
Sign up to set email alerts
|

Learning Temporal Action Proposals With Fewer Labels

Abstract: Temporal action proposals are a common module in action detection pipelines today. Most current methods for training action proposal modules rely on fully supervised approaches that require large amounts of annotated temporal action intervals in long video sequences. The large cost and effort in annotation that this entails motivate us to study the problem of training proposal modules with less supervision. In this work, we propose a semi-supervised learning algorithm specifically designed for training tempora… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
34
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(37 citation statements)
references
References 29 publications
1
34
0
Order By: Relevance
“…Gong et al [176] is a self-supervised method that attained the state-of-the-art results on ActivityNet-1.2 among methods with limited supervision, confirming the advantageous of self-supervised learning. The recent state-of-the-art weakly supervised methods such as D2-Net [174] achieved comparable performance to the semisupervised methods of Ji et al [139] and TTC-Loc [175]. This is interesting specially because D2-Net [174] does not use temporal annotation of actions at all while Ji et al [139] and TTC-Loc [175] use temporal annotations at least for a small percentage of videos in the dataset.…”
Section: Methods With Limited Supervisionmentioning
confidence: 83%
See 1 more Smart Citation
“…Gong et al [176] is a self-supervised method that attained the state-of-the-art results on ActivityNet-1.2 among methods with limited supervision, confirming the advantageous of self-supervised learning. The recent state-of-the-art weakly supervised methods such as D2-Net [174] achieved comparable performance to the semisupervised methods of Ji et al [139] and TTC-Loc [175]. This is interesting specially because D2-Net [174] does not use temporal annotation of actions at all while Ji et al [139] and TTC-Loc [175] use temporal annotations at least for a small percentage of videos in the dataset.…”
Section: Methods With Limited Supervisionmentioning
confidence: 83%
“…In Semi-supervised setting, a small number of videos are fully annotated with the temporal boundary of actions and class labels while a large number of videos are either unlabeled or include only video-level labels. Ji et al [139] employ a fully supervised framework, known as BSN [46], to exploit the small set of labeled data. They encode the input video into a feature sequence and apply sequential perturbations (time warping and time masking [140]) on it.…”
Section: Semi-supervised Action Detectionmentioning
confidence: 99%
“…We apply our method to generate action proposals. Action proposals is an essential part of many methods for action detection, explored by a number of recent papers [8,10,15,[19][20][21]38]. A popular approach to generate action proposals is to estimate an actionness score for each temporal unit and then apply some sort of temporal grouping and non-maxima suppression.…”
Section: Action Proposalsmentioning
confidence: 99%
“…Only two works have so far explored less supervised alternatives: Ji et al [30] and Khatir et al [31]. With a semi-supervised approach in [30], the authors investigate on how the performance of a model is affected when varying the amount of labels used during training.…”
Section: Temporal Action Proposalsmentioning
confidence: 99%
“…Only two works have so far explored less supervised alternatives: Ji et al [30] and Khatir et al [31]. With a semi-supervised approach in [30], the authors investigate on how the performance of a model is affected when varying the amount of labels used during training. Meanwhile, the model in [31] proposes to extract proposals using an online agglomerative clustering based on distances between consecutive frame features.…”
Section: Temporal Action Proposalsmentioning
confidence: 99%