A study of the generalizability of self-supervised representations

Tendle, Atharva; Hasan, Mohammad Rashedul

doi:10.1016/j.mlwa.2021.100124

Cited by 19 publications

(11 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While both object detection and instance segmentation target localization of arbitrary class objects, 3DHPSE only targets a single class, the human. For inference on arbitrary class objects, learning a wide range of general features unlimited to labels of a dataset could be advantageous in the generalization aspect (Tendle & Hasan, 2021). However, for 3DHPSE, a backbone network is preferred to learn more about human features rather than features of arbitrary objects, given the limited learning capacity.…”

Section: Pre-training On Imagenetmentioning

confidence: 99%

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Choi¹,

Nam²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recently, a few self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection. However, its effects on 3D human body pose and shape estimation (3DHPSE) are open to question, whose target is fixed to a unique class, the human, and has an inherent task gap with SSL. We empirically study and analyze the effects of SSL and further compare it with other pre-training alternatives for 3DH-PSE. The alternatives are 2D annotation-based pre-training and synthetic data pretraining, which share the motivation of SSL that aims to reduce the labeling cost. They have been widely utilized as a source of weak-supervision or fine-tuning, but have not been remarked as a pre-training source. SSL methods underperform the conventional ImageNet classification pre-training on multiple 3DHPSE benchmarks by 7.7% on average. In contrast, despite a much less amount of pre-training data, the 2D annotation-based pre-training improves accuracy on all benchmarks and shows faster convergence during fine-tuning. Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.

show abstract

Section: Pre-training On Imagenetmentioning

confidence: 99%

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Choi¹,

Nam²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Hence, SSL is not limited to learning only the label-relevant features that help predict the frequent classes, but rather a diverse set of generalizable representations, including both label-relevant and irrelevant features from unlabeled data. Learning during the pretext task also contributes to the representation-invariance property of an SSL model (Tendle & Hasan, 2021), such that it captures the ingrained characteristics of the input distribution, that are generalizable or transferable to downstream tasks. Therefore, SSL methods can generalize to rare classes better than SL approaches.…”

Section: F I G U R E 1 1 Confusion Matrix For Nearest Neighbor Contra...mentioning

confidence: 99%

“…Therefore, SSL methods can generalize to rare classes better than SL approaches. SSL's robustness to class imbalance is thoroughly demonstrated by Liu, Zhang et al (2021), and the generalizability of self-supervised representations is discussed by Tendle and Hasan (2021).…”

Section: F I G U R E 1 1 Confusion Matrix For Nearest Neighbor Contra...mentioning

confidence: 99%

Self‐supervised learning improves classification of agriculturally important insect pests in plants

Kar

Nagasubramanian

Elango

et al. 2023

The Plant Phenome Journal

View full text Add to dashboard Cite

Insect pests cause significant damage to food production, so early detection and efficient mitigation strategies are crucial. There is a continual shift toward machine learning (ML)‐based approaches for automating agricultural pest detection. Although supervised learning has achieved remarkable progress in this regard, it is impeded by the need for significant expert involvement in labeling the data used for model training. This makes real‐world applications tedious and oftentimes infeasible. Recently, self‐supervised learning (SSL) approaches have provided a viable alternative to training ML models with minimal annotations. Here, we present an SSL approach to classify 22 insect pests. The framework was assessed on raw and segmented field‐captured images using three different SSL methods, Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Barlow Twins. SSL pre‐training was done on ResNet‐18 and ResNet‐50 models using all three SSL methods on the original RGB images and foreground segmented images. The performance of SSL pre‐training methods was evaluated using linear probing of SSL representations and end‐to‐end fine‐tuning approaches. The SSL‐pre‐trained convolutional neural network models were able to perform annotation‐efficient classification. NNCLR was the best performing SSL method for both linear and full model fine‐tuning. With just 5% annotated images, transfer learning with ImageNet initialization obtained 74% accuracy, whereas NNCLR achieved an improved classification accuracy of 79% for end‐to‐end fine‐tuning. Models created using SSL pre‐training consistently performed better, especially under very low annotation, and were robust to object class imbalances. These approaches help overcome annotation bottlenecks and are resource efficient.

show abstract

“…While supervised monocular depth estimation currently outperforms self-supervised methods, their performance is converging towards that of supervised ones. Additionally, research has shown that selfsupervised methods are better at generalizing across a variety of environments [26] (e.g., indoor/outdoor, urban/rural scenes). Many works assume the entire 3D world is a rigid scene, thus ignoring objects that move independently.…”

Section: Introductionmentioning

confidence: 99%

Dyna-DM: Dynamic Object-aware Self-supervised Monocular Depth Maps

Saunders¹,

Vogiatzis²,

Manso³

2022

Preprint

View full text Add to dashboard Cite

Self-supervised monocular depth estimation has been a subject of intense study in recent years, because of its applications in robotics and autonomous driving. Much of the recent work focuses on improving depth estimation by increasing architecture complexity. This paper shows that state-of-the-art performance can also be achieved by improving the learning process rather than increasing model complexity. More specifically, we propose (i) only using invariant pose loss for the first few epochs during training, (ii) disregarding small potentially dynamic objects when training, and (iii) employing an appearance-based approach to separately estimate object pose for truly dynamic objects. We demonstrate that these simplifications reduce GPU memory usage by 29% and result in qualitatively and quantitatively improved depth maps. The code is available at https://github.com/kieran514/Dyna-DM .

show abstract

A study of the generalizability of self-supervised representations

Cited by 19 publications

References 44 publications

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

Self‐supervised learning improves classification of agriculturally important insect pests in plants

Dyna-DM: Dynamic Object-aware Self-supervised Monocular Depth Maps

Contact Info

Product

Resources

About