“…Specifically, when pretrained with the 3D ResNet-18 backbone, our method outperforms 3D-RotNet [19], ST-Puzzle [21], and DPC [13] by a large margin (80.5% vs. 62.9%, 65.8%, and 68.2%, respectively, on UCF-101 and 52.3% vs. 33.7%, 33.7%, and 34.5%). When utilizing S3D-G as the backbone, our ASC-Net achieves better accuracy than SpeedNet [2], Pace [34], and RSPNet [5] (90.8% vs. 81.1%, 87.1%, and 89.9%, respectively, on UCF-101 and 60.5% vs. 48.8%, 52.6%, and 59.9%) under the same settings. Remarkably, without the need of any annotation for pretraining, our ASCNet outperforms the ImageNet [10] supervised pretrained model over two datasets (90.8% vs. 86.6%, 60.5% vs. 57.7%).…”