2020
DOI: 10.1007/978-3-030-58520-4_30
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Video Representation Learning by Pace Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
202
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 164 publications
(202 citation statements)
references
References 45 publications
0
202
0
Order By: Relevance
“…However, our method always outperforms the two other methods such as VCOP and VCP regardless of the 3D ConvNet models. From table 9, our method shows a comparable results compared to the state-of-the-art (SOTA) self-supervised methods such as VCOP, VCP, Dense Predictive Coding (DPC) [39], SpeedNet [24], Temporal Transformation (TT) [25], Pace Prediction (PP) [40], and CoCLR [38]. Since each method has different backbone architecture, hyper-parameter setting, and augmentation method, we compare the performance of each backbone and augmentation setting to ensure fair comparison.…”
Section: B Action Recognitionmentioning
confidence: 85%
“…However, our method always outperforms the two other methods such as VCOP and VCP regardless of the 3D ConvNet models. From table 9, our method shows a comparable results compared to the state-of-the-art (SOTA) self-supervised methods such as VCOP, VCP, Dense Predictive Coding (DPC) [39], SpeedNet [24], Temporal Transformation (TT) [25], Pace Prediction (PP) [40], and CoCLR [38]. Since each method has different backbone architecture, hyper-parameter setting, and augmentation method, we compare the performance of each backbone and augmentation setting to ensure fair comparison.…”
Section: B Action Recognitionmentioning
confidence: 85%
“…Self-supervised learning aims to extract the underlying useful representation of unlabeled data by designing effective pretext tasks. Recently, self-supervised techniques have a broad range of applications in different domains such as computer vision [14][15][16][17][18], and audio/speech processing [19][20][21][22]. For visual data, various pretext tasks are designed including solving jigsaw puzzles [14], rotation prediction [15] and visual contrastive learning [16] for image, and frame order validation [17] and pace prediction [18]…”
Section: Related Workmentioning
confidence: 99%
“…Instead of directly predicting low-level information, methods based on spatio- more dedicate pretext tasks, such as temporal order prediction [13,14,39,40,75] and video speed prediction [15,16,18]. Compared to dense prediction methods, the spatio-temporal reasoning methods are more efficient since they discard additional generators.…”
Section: Spatio-temporal Reasoning Methodsmentioning
confidence: 99%
“…Although improvement can be achieved by temporal order prediction, there is still a noticeable gap in performance when compared to fully-supervised methods. To narrow the gap of performance, recent methods [15,16,18] et al [18] propose a self-supervised pace prediction task, where they discard the generation task in PRP [15] and include an additional constrative learning task.…”
Section: Spatio-temporal Reasoning Methodsmentioning
confidence: 99%
See 1 more Smart Citation