2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00133
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Motion Learning from Static Images

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(9 citation statements)
references
References 43 publications
0
9
0
Order By: Relevance
“…We prefer to be less dataset-dependent and generate synthetic motion tubelets for contrastive learning, which also offers a considerable data-efficiency benefit. CtP [74] and MoSI [29] both aim to predict motions to the training data. CtP [74] learns to track image patches in video clips to focus on local motion features while MoSI [29] adds pseudo-motions to static images and learns to predict the speed and direction of motions to enhance video representations.…”
Section: Related Workmentioning
confidence: 99%
“…We prefer to be less dataset-dependent and generate synthetic motion tubelets for contrastive learning, which also offers a considerable data-efficiency benefit. CtP [74] and MoSI [29] both aim to predict motions to the training data. CtP [74] learns to track image patches in video clips to focus on local motion features while MoSI [29] adds pseudo-motions to static images and learns to predict the speed and direction of motions to enhance video representations.…”
Section: Related Workmentioning
confidence: 99%
“…Modelling the temporal dynamics is essential for a genuine understanding of videos. Hence, it is widely explored in both supervised [20,35,48,49,63,70] and self-supervised paradigm [28,29,34,36,39]. Self-supervised approaches learns temporal modelling by solving various pre-text tasks, such as dense future prediction [28,29], jigsaw puzzle solving [36,39], and pseudo motion classification [34], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, it is widely explored in both supervised [20,35,48,49,63,70] and self-supervised paradigm [28,29,34,36,39]. Self-supervised approaches learns temporal modelling by solving various pre-text tasks, such as dense future prediction [28,29], jigsaw puzzle solving [36,39], and pseudo motion classification [34], etc. Supervised video recognition explores various connections between different frames, such as 3D convolutions [62], temporal convolution [63], and temporal shift [48], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Huang et al [ 103 ] used SSL to address the problem of labelling video datasets that required a huge number of human annotators. SSL was used in their proposed model motion from static images (MoSI) to train video models by learning representations from either video or image datasets.…”
Section: Self-supervised Learning (Ssl) Approachmentioning
confidence: 99%