2021
DOI: 10.1109/tpami.2021.3078798
|View full text |Cite
|
Sign up to set email alerts
|

Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
24
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(24 citation statements)
references
References 55 publications
0
24
0
Order By: Relevance
“…5 the pseudo-labeling scheme, without auxiliary losses to regularize the learning process. Notably while some models like untrimmedNets [33] use a different backbone (TSN and TCN), most recent models [18,37,20,22] use the same two-stream I3D feature extraction backbone as our model does, thus are fair comparison from the feature extraction aspect. Compared to the best result among the four recent models [18,37,20,22], we get 3% significant improvement at mAP@0.5.…”
Section: Methodsmentioning
confidence: 98%
See 2 more Smart Citations
“…5 the pseudo-labeling scheme, without auxiliary losses to regularize the learning process. Notably while some models like untrimmedNets [33] use a different backbone (TSN and TCN), most recent models [18,37,20,22] use the same two-stream I3D feature extraction backbone as our model does, thus are fair comparison from the feature extraction aspect. Compared to the best result among the four recent models [18,37,20,22], we get 3% significant improvement at mAP@0.5.…”
Section: Methodsmentioning
confidence: 98%
“…Notably while some models like untrimmedNets [33] use a different backbone (TSN and TCN), most recent models [18,37,20,22] use the same two-stream I3D feature extraction backbone as our model does, thus are fair comparison from the feature extraction aspect. Compared to the best result among the four recent models [18,37,20,22], we get 3% significant improvement at mAP@0.5. Our model also shows more significant improvement at higher threshold metrics tIoU=0.6 and tIoU=0.7, which implies our action proposals are more complete.…”
Section: Methodsmentioning
confidence: 98%
See 1 more Smart Citation
“…We then compute final predictions by applying non-maximum suppression to eliminate overlapping and similar proposals. We compare our approach with an extensive set of leading recent baselines: TSM [35], CMCS [19], MAAN [36], 3C-Net [23], CleanNet [20], BaSNet [14], BM [25], DGAM [28], TSCN [37] and EM-MIL [22]. Details for each baseline can be found in the related work section, and we directly use the results reported by the respective authors.…”
Section: Methodsmentioning
confidence: 99%
“…Nguyen et al [22] introduced a sparsity regularization for video-level classification. Shou et al [26] and Liu [17] investigated score contrast in the temporal dimension. Hideand-Seek [29] randomly removed frame sequences during training to force the network to respond to multiple relevant parts.…”
Section: Related Workmentioning
confidence: 99%