2022
DOI: 10.1016/j.cviu.2022.103406
|View full text |Cite
|
Sign up to set email alerts
|

TCLR: Temporal contrastive learning for video representation

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
37
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 86 publications
(42 citation statements)
references
References 39 publications
0
37
0
Order By: Relevance
“…A discriminator is used to predict high probabilities on similar pairs and low probabilities for dissimilar pairs. To focus on both local and global representations, Dave et al [2022] use a local loss that treats non-overlapping clips as negatives and spatially augmented versions of the same clip as positives and a global-local loss that maximize the similarity between the local representation of the entire clip and the global representations of the corresponding sub-clip.…”
Section: Spatio-temporal Augmentationmentioning
confidence: 99%
“…A discriminator is used to predict high probabilities on similar pairs and low probabilities for dissimilar pairs. To focus on both local and global representations, Dave et al [2022] use a local loss that treats non-overlapping clips as negatives and spatially augmented versions of the same clip as positives and a global-local loss that maximize the similarity between the local representation of the entire clip and the global representations of the corresponding sub-clip.…”
Section: Spatio-temporal Augmentationmentioning
confidence: 99%
“…For example, Li et al (2021) proposed JigsawGAN to learn semantic information and edge information of images, which is a GAN-based self-supervised method for solving jigsaw puzzles with unpaired images. Contrast learning ( Tian et al, 2020 ; Wang et al, 2021 ; Dave et al, 2022 ) can be regarded as a discriminative method which aims to group positive samples and separate negative samples. Dave et al (2022) developed a new temporal contrastive learning framework comprising local–local and local–global temporal contrastive loss to encourage the features to be distinct across the temporal dimension.…”
Section: Related Workmentioning
confidence: 99%
“…Contrast learning ( Tian et al, 2020 ; Wang et al, 2021 ; Dave et al, 2022 ) can be regarded as a discriminative method which aims to group positive samples and separate negative samples. Dave et al (2022) developed a new temporal contrastive learning framework comprising local–local and local–global temporal contrastive loss to encourage the features to be distinct across the temporal dimension. Generative model-based approaches usually use some generative tasks as pretext tasks to learn features, such as image reconstruction ( Fan et al, 2022 ), image inpainting ( Quan et al, 2022 ), image coloring ( Bi et al, 2021 ), etc.…”
Section: Related Workmentioning
confidence: 99%
“…The identify of the animals is discarded and the semantic structure is preserved, as evidenced by the fact that the two red dots are very close to one another. recognition using 3D pose data [39,42,68] and video-based action understanding [13,50]. However, a barrier to using these tools in neuroscience is that the statistics of our neural data-the locations and sizes of cells-and behavioral data-body part lengths and limb ranges of motion-can be very different from animal to animal, creating a large domain gap.…”
Section: Related Workmentioning
confidence: 99%
“…Specifically, we trained our neural decoder f n along with the others without using any action labels. Then, freezing the neural encoder parameters, we trained a linear model on the encoded features, which is an evaluation protocol widely used in the field [10,13,25,39]. We used either half or all action labels.…”
Section: Benchmarksmentioning
confidence: 99%