2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00610
|View full text |Cite
|
Sign up to set email alerts
|

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
45
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 70 publications
(45 citation statements)
references
References 54 publications
0
45
0
Order By: Relevance
“…Using the same backbone network (ResNet-18), TCL needs only 15% and 33% labeled data in Jester and Mini-Something-V2 respectively to reach the performance of the fully supervised approach. Likewise, we observe as good as 8.14% and 4.63% absolute improvement in activity recognition performance over the next best approach, FixMatch [46] (NeurIPS'20) using only 5% labeled data in Mini-Something-V2 [9] and Kinetics-400 [31] datasets respectively. We benchmark several baselines by extending state-of-the-art image-domain semi-supervised approaches to videos and will release their codes along with that of TCL on publication.…”
Section: Percentage Of Labeled Datamentioning
confidence: 62%
See 1 more Smart Citation
“…Using the same backbone network (ResNet-18), TCL needs only 15% and 33% labeled data in Jester and Mini-Something-V2 respectively to reach the performance of the fully supervised approach. Likewise, we observe as good as 8.14% and 4.63% absolute improvement in activity recognition performance over the next best approach, FixMatch [46] (NeurIPS'20) using only 5% labeled data in Mini-Something-V2 [9] and Kinetics-400 [31] datasets respectively. We benchmark several baselines by extending state-of-the-art image-domain semi-supervised approaches to videos and will release their codes along with that of TCL on publication.…”
Section: Percentage Of Labeled Datamentioning
confidence: 62%
“…Datasets. We evaluate our approach using four datasets, namely Mini-Something-V2 [9], Jester [36], Kinetics-400 [31] and Charades-Ego [43]. Mini-Something-V2 is a subset of Something-Something V2 dataset [23] containing 81K training videos and 12K testing videos across 87 action classes.…”
Section: Methodsmentioning
confidence: 99%
“…Given the growing demand for eHealth apps 1 , it is surprising that there is not a larger body of work on estimating physical intensity of activities in videos. This might be due to the general focus of video classification research evolv- ing mostly around activity categorization [6,8,13,15,17], while virtually all exercise intensity assessment datasets focus on wearable sensors [2,5,39] delivering, e.g., heart rate or accelerometer signals. To promote the task of visually estimating the hourly amount of kilocalories burned by the human during the current activity, we introduce the novel Vid2Burn dataset, featuring > 9K videos of 72 different activity types with both caloric expenditure annotations on category-and sample-level.…”
Section: Vid2burn: a Benchmark For Estimating Caloric Expenditure In ...mentioning
confidence: 99%
“…CNN-based Action Recognition. Action recognition is dominated by CNN-based models recently [17,6,15,16,7,26,47,57,28,20,46]. These models process the video as a cube to extract spatial-temporal features via the proposed temporal modeling methods.…”
Section: Related Workmentioning
confidence: 99%
“…Top-1 Top-5 TRN-Incpetion [57] 28.3 53.9 TAM-R50 [15] 30.8 58.2 I3D-R50 [7] 31.2 58.9 SlowFast-R50-8×8 [17] 31.2 58.7 CoST-R101 [24] 32.4 60.0 SRTG-R3D-101 [41] 33.6 58.5 AssembleNet [37] 33.9 60.9 ViViT-L [1] 38.0 64.9 SIFAR-15 ‡ 38.5 67.4 SIFAR-12 ‡ 39.9 69.2…”
Section: Modelmentioning
confidence: 99%