2018
DOI: 10.1007/978-3-030-01225-0_18
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-temporal Channel Correlation Networks for Action Classification

Abstract: The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the cur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
126
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 178 publications
(138 citation statements)
references
References 41 publications
(70 reference statements)
0
126
0
Order By: Relevance
“…Our experiments are best with STCnet and 3D-ResNet/Next configuration which is of depth 101. 58.9 -C3D [17] 55.6 -3D ResNet101 [17] 62.8 83.9 3D ResNext101 [17] 65.1 85.7 RGB-I3D [3] 68.4 88 STC-ResNet101 (16 frames) [6] 64.1 85.2 STC-ResNext101 (16 frames) [6] 66.2 86.5 STC-ResNext101 (32 frames) [6] 68.7 88.5 DynamoNet (ResNext) ( In Table 5, we compare the performance of Dy-namoNet with current state-of-the-art methods on UCF101/HMDB51.…”
Section: Action Recognitionmentioning
confidence: 99%
“…Our experiments are best with STCnet and 3D-ResNet/Next configuration which is of depth 101. 58.9 -C3D [17] 55.6 -3D ResNet101 [17] 62.8 83.9 3D ResNext101 [17] 65.1 85.7 RGB-I3D [3] 68.4 88 STC-ResNet101 (16 frames) [6] 64.1 85.2 STC-ResNext101 (16 frames) [6] 66.2 86.5 STC-ResNext101 (32 frames) [6] 68.7 88.5 DynamoNet (ResNext) ( In Table 5, we compare the performance of Dy-namoNet with current state-of-the-art methods on UCF101/HMDB51.…”
Section: Action Recognitionmentioning
confidence: 99%
“…In this section, we study the proposed automatic method of designing action recognition network to demonstrate its advantages over other famous action recognition architectures, e.g., 3D-ResNet [19], C3D network [20], and STC-ResNet [21]. We evaluate our algorithm on the challenging action recognition dataset UCF101, which is a trimmed dataset containing 13320 video clips of 101 classes, with the training from scratch protocol.…”
Section: Methodsmentioning
confidence: 99%
“…#params model size Accuracy 3D-ResNet 18 [19] 33.2M 252M 42.4% 3D-ResNet 101 [19] 100M + 652M 46.7% 3D-ConvNet [20] 79M 305M 51.6% STC-ResNet 18 [21] 33.2M + -42.8% STC-ResNet 50 [21] 92M + -46.2% STC-ResNet 101 [21] 100M + -47.9% Ours 0.67M 7.32M 58.6%…”
Section: Architecturesmentioning
confidence: 99%
“…While these works analyze if the networks can be better trained using full supervision if additional modalities including the modality of the test data are available during training, we address the problem if the modality of the annotated training set differs from the modality of the test set. In [4], a 3D convolutional neural network is initialized by transferring the knowledge of a pre-trained 2D CNN. Cross-modal distillation has been also used for other tasks such as object detection [20], emotion recognition [21], or human pose estimation [22].…”
Section: Related Workmentioning
confidence: 99%
“…Action recognition is addressed in many works and in particular deep learning methods have been proposed for various modalities like RGB videos [1,2,3,4] or skeleton data [5,6,7,8]. Deep learning methods for action recognition, however, require large annotated datasets.…”
Section: Introductionmentioning
confidence: 99%