2022
DOI: 10.1007/978-3-031-20062-5_42
|View full text |Cite
|
Sign up to set email alerts
|

CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
27
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(30 citation statements)
references
References 47 publications
0
27
0
Order By: Relevance
“…In the field of skeleton-based action recognition, prior works (Li et al, 2021;Mao et al, 2022;Guo et al, 2022) proposed to apply contrastive learning in the pre-training stage by roughly following the frameworks mentioned above. CrossCLR (Li et al, 2021) mined positive pairs in the data space and explored the cross-modal distribution relationships.…”
Section: Contrastive Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…In the field of skeleton-based action recognition, prior works (Li et al, 2021;Mao et al, 2022;Guo et al, 2022) proposed to apply contrastive learning in the pre-training stage by roughly following the frameworks mentioned above. CrossCLR (Li et al, 2021) mined positive pairs in the data space and explored the cross-modal distribution relationships.…”
Section: Contrastive Learningmentioning
confidence: 99%
“…CrossCLR (Li et al, 2021) mined positive pairs in the data space and explored the cross-modal distribution relationships. Further, CMD (Mao et al, 2022) transferred the cross-modal knowledge in a distillation manner. And AimCLR (Guo et al, 2022) used extreme augmentations to improve the representation universality.…”
Section: Contrastive Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the previous related works [27,44], BiGRU is adopted as the encoder for a fair comparison. All sequence lengths are resized to the fixed 64 frames via temporal crop-resize [44].…”
Section: Implementation Details and Evaluationmentioning
confidence: 99%
“…The query encoder first completes the pre-training contrastive task on all unlabeled data. Then, the pre-trained encoder and linear classifier are fine-tuned on randomly sampled 1% and 10% labeled data Joint+Motion+Bone 77.8 3s-AimCLR [9] Joint+Motion+Bone 78.9 3s-HiCLR [55] Joint+Motion+Bone 80.4 3s-CrosSCLR-B [19] Joint+Motion+Bone 82.1 3s-CPM [54] Joint+Motion+Bone 83.2 3s-HiCo [7] Joint+Motion+Bone 83.8 3s-CMD [27] Joint+Motion+Bone 84.1 3s-A 2 MC Joint+Motion+Bone 84.6…”
Section: Implementation Details and Evaluationmentioning
confidence: 99%