2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00706
|View full text |Cite
|
Sign up to set email alerts
|

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

Abstract: We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
292
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 366 publications
(305 citation statements)
references
References 43 publications
0
292
0
1
Order By: Relevance
“…UntrimmedNets [28] employs attention mechanisms to learn the pattern of precut action segments. STPN [29] utilizes a sparsity constraint to detect the activities, which improves the performance of action localization. TSR-Net [30] integrates self-attention and transfer learning with temporal localization framework to obtain precise temporal intervals in untrimmed videos.…”
Section: B Action Localizationmentioning
confidence: 99%
“…UntrimmedNets [28] employs attention mechanisms to learn the pattern of precut action segments. STPN [29] utilizes a sparsity constraint to detect the activities, which improves the performance of action localization. TSR-Net [30] integrates self-attention and transfer learning with temporal localization framework to obtain precise temporal intervals in untrimmed videos.…”
Section: B Action Localizationmentioning
confidence: 99%
“…Existing approaches have investigated different weak supervision strategies for action localization. The work of [25,14,28] use action category labels in videos for temporal localization, whereas [13] uses point-level supervision to spatio-temporally localize the actions. [17,2] exploit the order of actions in a video as a weak supervision cue.…”
Section: Related Workmentioning
confidence: 99%
“…Weakly-supervised temporal action localization has been investigated using different types of weak labels, e.g., action categories [25,28,14], movie scripts [12,1] and sparse spatio-temporal points [13]. Recently, Paul et al [16] proposed an action localization approach, demonstrating stateof-the-art results, using video-level category labels as the weak supervision.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Li et al [19] apply attention for action recognition and action detection in untrimmed sequences, using features from multiple modalities as the input to the temporal attention LSTM before softmax normalisation. Nguyen et al [25] learn attention for action classification. They normalise the attention scores by a sigmoid function, and then use these to estimate the discriminative class-specific temporal regions for localising actions.…”
Section: Related Workmentioning
confidence: 99%