Self-supervised Motion Learning from Static Images

Huang, Ziyuan; Zhang, Shiwei; Jiang, Jianwen; Tang, Mingqian; Jin, Rong; Ang, Marcelo H.

doi:10.1109/cvpr46437.2021.00133

Cited by 20 publications

(9 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We prefer to be less dataset-dependent and generate synthetic motion tubelets for contrastive learning, which also offers a considerable data-efficiency benefit. CtP [74] and MoSI [29] both aim to predict motions to the training data. CtP [74] learns to track image patches in video clips to focus on local motion features while MoSI [29] adds pseudo-motions to static images and learns to predict the speed and direction of motions to enhance video representations.…”

Section: Related Workmentioning

confidence: 99%

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

Thoker¹,

Doughty²,

Snoek³

2023

Preprint

View full text Add to dashboard Cite

We propose a self-supervised method for learning motion-focused video representations. Existing approaches minimize distances between temporally augmented videos, which maintain high spatial similarity. We instead propose to learn similarities between videos with identical local motion dynamics but an otherwise different appearance. We do so by adding synthetic motion trajectories to videos which we refer to as tubelets. By simulating different tubelet motions and applying transformations, such as scaling and rotation, we introduce motion patterns beyond what is present in the pretraining data. This allows us to learn a video representation that is remarkably data-efficient: our approach maintains performance when using only 25% of the pretraining videos. Experiments on 10 diverse downstream settings demonstrate our competitive performance and generalizability to new domains and fine-grained actions.

show abstract

Section: Related Workmentioning

confidence: 99%

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

Thoker¹,

Doughty²,

Snoek³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Modelling the temporal dynamics is essential for a genuine understanding of videos. Hence, it is widely explored in both supervised [20,35,48,49,63,70] and self-supervised paradigm [28,29,34,36,39]. Self-supervised approaches learns temporal modelling by solving various pre-text tasks, such as dense future prediction [28,29], jigsaw puzzle solving [36,39], and pseudo motion classification [34], etc.…”

Section: Related Workmentioning

confidence: 99%

“…Hence, it is widely explored in both supervised [20,35,48,49,63,70] and self-supervised paradigm [28,29,34,36,39]. Self-supervised approaches learns temporal modelling by solving various pre-text tasks, such as dense future prediction [28,29], jigsaw puzzle solving [36,39], and pseudo motion classification [34], etc. Supervised video recognition explores various connections between different frames, such as 3D convolutions [62], temporal convolution [63], and temporal shift [48], etc.…”

Section: Related Workmentioning

confidence: 99%

TCTrack: Temporal Contexts for Aerial Tracking

Cao¹,

Huang²,

Peng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Temporal contexts among consecutive frames are far from being fully utilized in existing visual trackers. In this work, we present TCTrack 1 , a comprehensive framework to fully exploit temporal contexts for aerial tracking. The temporal contexts are incorporated at two levels: the extraction of features and the refinement of similarity maps. Specifically, for feature extraction, an online temporally adaptive convolution is proposed to enhance the spatial features using temporal information, which is achieved by dynamically calibrating the convolution weights according to the previous frames. For similarity map refinement, we propose an adaptive temporal transformer, which first effectively encodes temporal knowledge in a memory-efficient way, before the temporal knowledge is decoded for accurate adjustment of the similarity map. TCTrack is effective and efficient: evaluation on four aerial tracking benchmarks shows its impressive performance; real-world UAV tests show its high speed of over 27 FPS on NVIDIA Jetson AGX Xavier.

show abstract

“…Huang et al [ 103 ] used SSL to address the problem of labelling video datasets that required a huge number of human annotators. SSL was used in their proposed model motion from static images (MoSI) to train video models by learning representations from either video or image datasets.…”

Section: Self-supervised Learning (Ssl) Approachmentioning

confidence: 99%

Building towards Automated Cyberbullying Detection: A Comparative Analysis

Al-Harigy

Al-Nuaim

Moradpoor

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The increased use of social media among digitally anonymous users, sharing their thoughts and opinions, can facilitate participation and collaboration. However, this anonymity feature which gives users freedom of speech and allows them to conduct activities without being judged by others can also encourage cyberbullying and hate speech. Predators can hide their identity and reach a wide range of audience anytime and anywhere. According to the detrimental effect of cyberbullying, there is a growing need for cyberbullying detection approaches. In this survey paper, a comparative analysis of the automated cyberbullying techniques from different perspectives is discussed including data annotation, data preprocessing, and feature engineering. In addition, the importance of emojis in expressing emotions as well as their influence on sentiment classification and text comprehension leads us to discuss the role of incorporating emojis in the process of cyberbullying detection and their influence on the detection performance. Furthermore, the different domains for using self-supervised learning (SSL) as an annotation technique for cyberbullying detection are explored.

show abstract

Self-supervised Motion Learning from Static Images

Cited by 20 publications

References 43 publications

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

TCTrack: Temporal Contexts for Aerial Tracking

Building towards Automated Cyberbullying Detection: A Comparative Analysis

Contact Info

Product

Resources

About